All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: [PATCH 1/3] Enhance platform support for PCI
@ 2015-02-20 11:34 Manish Jaggi
  2015-02-20 12:03 ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 11:34 UTC (permalink / raw)
  To: xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya, Julien Grall,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell

The platform APIs are enhanced to provide support for parsing pci device
tree nodes and storing the config-space address which is later used for
pci_read/pci_write config calls.
---
  xen/arch/arm/platform.c        | 27 +++++++++++++++++++++++++++
  xen/include/asm-arm/platform.h | 18 +++++++++++++++++-
  2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/platform.c b/xen/arch/arm/platform.c
index cb4cda8..25f07d2 100644
--- a/xen/arch/arm/platform.c
+++ b/xen/arch/arm/platform.c
@@ -128,6 +128,33 @@ int __init platform_smp_init(void)
  }
  #endif
  
+int platform_pci_dt_node_init(struct dt_device_node *node, const void *data)
+{
+    if(platform->pci_dt_node_init)
+        return platform->pci_dt_node_init(node, data);
+    return -1;
+}
+
+int platform_pci_write_config(unsigned int seg, unsigned int bus,
+                                    unsigned int dev, unsigned int func,
+                                    int reg, int size, uint32_t val)
+{
+    if(platform->pci_write_config)
+        return platform->pci_write_config(seg, bus, dev, func, reg, size, val);
+
+    return -1;
+}
+
+int platform_pci_read_config(unsigned int seg, unsigned int bus,
+                                    unsigned int dev, unsigned int func,
+                                    int reg, int size, uint32_t *val)
+{
+    if(platform->pci_read_config)
+        return platform->pci_read_config(seg, bus, dev, func, reg, size, val);
+
+    return -1;
+}
+
  void platform_reset(void)
  {
      if ( platform && platform->reset )
diff --git a/xen/include/asm-arm/platform.h b/xen/include/asm-arm/platform.h
index eefaca6..da3cc63 100644
--- a/xen/include/asm-arm/platform.h
+++ b/xen/include/asm-arm/platform.h
@@ -26,6 +26,16 @@ struct platform_desc {
      void (*reset)(void);
      /* Platform power-off */
      void (*poweroff)(void);
+    /* PCI callback when a pci device tree node is found */
+    int (*pci_dt_node_init)(struct dt_device_node *node, const void *data);
+    /* PCI configspace read and write */
+    int (*pci_read_config)(unsigned int seg, unsigned int bus,
+                            unsigned int dev, unsigned int func,
+                            int reg, int size, uint32_t *val);
+
+    int (*pci_write_config)(unsigned int seg, unsigned int bus,
+                            unsigned int dev, unsigned int func,
+                            int reg, int size, uint32_t val);
      /*
       * Platform quirks
       * Defined has a function because a platform can support multiple
@@ -73,7 +83,13 @@ bool_t platform_has_quirk(uint32_t quirk);
  bool_t platform_device_is_blacklisted(const struct dt_device_node *node);
  unsigned int platform_dom0_evtchn_ppi(void);
  void platform_dom0_gnttab(paddr_t *start, paddr_t *size);
-
+int platform_pci_dt_node_init(struct dt_device_node *node, const void *data);
+int platform_pci_write_config(unsigned int seg, unsigned int bus,
+                                    unsigned int dev, unsigned int func,
+                                    int reg, int size, uint32_t val);
+int platform_pci_read_config(unsigned int seg, unsigned int bus,
+                                    unsigned int dev, unsigned int func,
+                                    int reg, int size, uint32_t *val);
  #define PLATFORM_START(_name, _namestr)                         \
  static const struct platform_desc  __plat_desc_##_name __used   \
  __attribute__((__section__(".arch.info"))) = {                  \
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 11:34 RFC: [PATCH 1/3] Enhance platform support for PCI Manish Jaggi
@ 2015-02-20 12:03 ` Julien Grall
  2015-02-20 12:10   ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-20 12:03 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell

Hello Manish,

On 20/02/15 11:34, Manish Jaggi wrote:
> The platform APIs are enhanced to provide support for parsing pci device
> tree nodes and storing the config-space address which is later used for
> pci_read/pci_write config calls.

Can you explain why you choose to add per-platform callbacks rather than
a generic solution?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 12:03 ` Julien Grall
@ 2015-02-20 12:10   ` Manish Jaggi
  2015-02-20 12:20     ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 12:10 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell


On 20/02/15 5:33 pm, Julien Grall wrote:
> Hello Manish,
>
> On 20/02/15 11:34, Manish Jaggi wrote:
>> The platform APIs are enhanced to provide support for parsing pci device
>> tree nodes and storing the config-space address which is later used for
>> pci_read/pci_write config calls.
> Can you explain why you choose to add per-platform callbacks rather than
> a generic solution?
The platform code is similar to what linux has in 
drivers/pci/host/pci-<platform>.c. I have used the same concept.
>
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 12:10   ` Manish Jaggi
@ 2015-02-20 12:20     ` Julien Grall
  2015-02-20 12:34       ` Manish Jaggi
  2015-02-20 13:37       ` Ian Campbell
  0 siblings, 2 replies; 70+ messages in thread
From: Julien Grall @ 2015-02-20 12:20 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell

On 20/02/15 12:10, Manish Jaggi wrote:
> 
> On 20/02/15 5:33 pm, Julien Grall wrote:
>> Hello Manish,
>>
>> On 20/02/15 11:34, Manish Jaggi wrote:
>>> The platform APIs are enhanced to provide support for parsing pci device
>>> tree nodes and storing the config-space address which is later used for
>>> pci_read/pci_write config calls.
>> Can you explain why you choose to add per-platform callbacks rather than
>> a generic solution?
> The platform code is similar to what linux has in
> drivers/pci/host/pci-<platform>.c. I have used the same concept.

Please explain it in the commit message, it helps us to understand why
you did it.

Anyway, based on what you said, your approach looks wrong.

Firstly, the platform code is DT-centric and we don't expect to have a
such things for ACPI.

Secondly, the PCI host code be shared between multiple platform.

Overall, I would prefer to have a separate file and structure for
handling PCI host. Also, I think we could re-use the Linux code for this
purpose.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 12:20     ` Julien Grall
@ 2015-02-20 12:34       ` Manish Jaggi
  2015-02-20 13:01         ` Manish Jaggi
  2015-02-20 13:37       ` Ian Campbell
  1 sibling, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 12:34 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell


On 20/02/15 5:50 pm, Julien Grall wrote:
> On 20/02/15 12:10, Manish Jaggi wrote:
>> On 20/02/15 5:33 pm, Julien Grall wrote:
>>> Hello Manish,
>>>
>>> On 20/02/15 11:34, Manish Jaggi wrote:
>>>> The platform APIs are enhanced to provide support for parsing pci device
>>>> tree nodes and storing the config-space address which is later used for
>>>> pci_read/pci_write config calls.
>>> Can you explain why you choose to add per-platform callbacks rather than
>>> a generic solution?
>> The platform code is similar to what linux has in
>> drivers/pci/host/pci-<platform>.c. I have used the same concept.
> Please explain it in the commit message, it helps us to understand why
> you did it.
>
> Anyway, based on what you said, your approach looks wrong.
ok :)
> Firstly, the platform code is DT-centric and we don't expect to have a
> such things for ACPI.
>
> Secondly, the PCI host code be shared between multiple platform.
>
> Overall, I would prefer to have a separate file and structure for
> handling PCI host. Also, I think we could re-use the Linux code for this
> purpose.
I will move the code to drivers/pci/host
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 12:34       ` Manish Jaggi
@ 2015-02-20 13:01         ` Manish Jaggi
  2015-02-20 13:45           ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 13:01 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Prasun.kapoor, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Ian Campbell


On 20/02/15 6:04 pm, Manish Jaggi wrote:
>
> On 20/02/15 5:50 pm, Julien Grall wrote:
>> On 20/02/15 12:10, Manish Jaggi wrote:
>>> On 20/02/15 5:33 pm, Julien Grall wrote:
>>>> Hello Manish,
>>>>
>>>> On 20/02/15 11:34, Manish Jaggi wrote:
>>>>> The platform APIs are enhanced to provide support for parsing pci 
>>>>> device
>>>>> tree nodes and storing the config-space address which is later 
>>>>> used for
>>>>> pci_read/pci_write config calls.
>>>> Can you explain why you choose to add per-platform callbacks rather 
>>>> than
>>>> a generic solution?
>>> The platform code is similar to what linux has in
>>> drivers/pci/host/pci-<platform>.c. I have used the same concept.
>> Please explain it in the commit message, it helps us to understand why
>> you did it.
>>
>> Anyway, based on what you said, your approach looks wrong.
> ok :)
>> Firstly, the platform code is DT-centric and we don't expect to have a
>> such things for ACPI.
>>
>> Secondly, the PCI host code be shared between multiple platform.
>>
>> Overall, I would prefer to have a separate file and structure for
>> handling PCI host. Also, I think we could re-use the Linux code for this
>> purpose.
> I will move the code to drivers/pci/host
I have added ABI that segment no maps to the position on pci node in xen 
device tree. We had partially discussed about this during Linaro 
connect. What is your teams view on this, should this be ok or we 
introduce a property in device tree pci node {xen_segment_id = <1>}
>> Regards,
>>
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 12:20     ` Julien Grall
  2015-02-20 12:34       ` Manish Jaggi
@ 2015-02-20 13:37       ` Ian Campbell
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 13:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Prasun.kapoor, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Manish Jaggi

On Fri, 2015-02-20 at 12:20 +0000, Julien Grall wrote:
> Overall, I would prefer to have a separate file and structure for
> handling PCI host. Also, I think we could re-use the Linux code for this
> purpose.

(caveat; I've not looked at the code yet)

I had expected that PCI host controllers would be discovered via the
existing device model stuff and compatible string matching, e.g.
DT_DEVICE_START(some_pcihost, "SOME PCI HOST CONTROLLER", DEVICE_PCIBUS)
which would reference a set of compatible strings and a probe function,
the probe function would then call some pci bus registration function to
hook that bus into the system.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 13:01         ` Manish Jaggi
@ 2015-02-20 13:45           ` Ian Campbell
  2015-02-20 14:11             ` Jan Beulich
  2015-02-20 14:14             ` Manish Jaggi
  0 siblings, 2 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 13:45 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Kumar, Vijaya, Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

(Jan, curious if you have any thoughts on this, hopefully I've left
sufficient context for you to get what we are on about, the gist is we
need some way for dom0 and Xen to agree on how a PCI segment ID maps to
an actual PCI root controller, I think on x86 you either Just Know from
PC legacy or ACPI tells you?)

On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
> I have added ABI that segment no maps to the position on pci node in xen 
> device tree. We had partially discussed about this during Linaro 
> connect. What is your teams view on this, should this be ok or we 
> introduce a property in device tree pci node {xen_segment_id = <1>}

The DT node ordering cannot be relied on this way, so we certainly need
something else.

Ideally we should find a solution which doesn't require new properties.

The best solution would be to find some existing property of the PCI
host controller which is well defined and unique to each host
controller. I had been thinking that the base address of the PCI CFG
space might be a good enough handle.

The only question is whether the data type used for segment id in the
hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
not sure if we are going to be able to make this pci_segment_t and have
it differ for ARM.

Another option might be a new hypercall (assuming one doesn't already
exist) to register a PCI bus which would take e.g. the PCI CFG base
address and return a new u16 segment id to be used for all subsequent
PCI related calls. This would require the dom0 OS to hook its
pci_bus_add function, which might be doable (more doable than handling
xen_segment_id DT properties I think).

I'm not sure if this ends up being different on ACPI, where perhaps
MMCONFIG or some other table actually gives us a segment ID for each PCI
bus. Ideally whatever solution we end up with would fit into this model.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 13:45           ` Ian Campbell
@ 2015-02-20 14:11             ` Jan Beulich
  2015-02-20 14:26               ` Ian Campbell
  2015-02-20 14:14             ` Manish Jaggi
  1 sibling, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-20 14:11 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, Stefano Stabellini(Stefano.Stabellini@citrix.com)

>>> On 20.02.15 at 14:45, <ian.campbell@citrix.com> wrote:
> (Jan, curious if you have any thoughts on this, hopefully I've left
> sufficient context for you to get what we are on about, the gist is we
> need some way for dom0 and Xen to agree on how a PCI segment ID maps to
> an actual PCI root controller, I think on x86 you either Just Know from
> PC legacy or ACPI tells you?)

Yeah, without information from ACPI we'd have no way to know how
to access the config space of segments other than 0. Both I/O port
based access methods don't have room for specifying a segment
number. Since the MMCFG addresses get set up by firmware, it is
also firmware telling us the segment numbers. If you don't get them
arranged in any particular order, ...

> On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
>> I have added ABI that segment no maps to the position on pci node in xen 
>> device tree. We had partially discussed about this during Linaro 
>> connect. What is your teams view on this, should this be ok or we 
>> introduce a property in device tree pci node {xen_segment_id = <1>}
> 
> The DT node ordering cannot be relied on this way, so we certainly need
> something else.
> 
> Ideally we should find a solution which doesn't require new properties.
> 
> The best solution would be to find some existing property of the PCI
> host controller which is well defined and unique to each host
> controller. I had been thinking that the base address of the PCI CFG
> space might be a good enough handle.

... this approach would seem reasonable.

> The only question is whether the data type used for segment id in the
> hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
> not sure if we are going to be able to make this pci_segment_t and have
> it differ for ARM.

Are you expecting to have more than 64k segments? Otherwise,
just sequentially assign segment numbers as you discover them or
get them reported by Dom0. You could even have Dom0 tell you
the segment numbers (just like we do on x86), thus eliminating the
need for an extra mechanism for Dom0 to learn them.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 13:45           ` Ian Campbell
  2015-02-20 14:11             ` Jan Beulich
@ 2015-02-20 14:14             ` Manish Jaggi
  2015-02-20 14:39               ` Ian Campbell
  1 sibling, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 14:14 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Kumar, Vijaya, Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich


On 20/02/15 7:15 pm, Ian Campbell wrote:
> (Jan, curious if you have any thoughts on this, hopefully I've left
> sufficient context for you to get what we are on about, the gist is we
> need some way for dom0 and Xen to agree on how a PCI segment ID maps to
> an actual PCI root controller, I think on x86 you either Just Know from
> PC legacy or ACPI tells you?)
>
> On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
>> I have added ABI that segment no maps to the position on pci node in xen
>> device tree. We had partially discussed about this during Linaro
>> connect. What is your teams view on this, should this be ok or we
>> introduce a property in device tree pci node {xen_segment_id = <1>}
> The DT node ordering cannot be relied on this way, so we certainly need
> something else.
>
> Ideally we should find a solution which doesn't require new properties.
>
> The best solution would be to find some existing property of the PCI
> host controller which is well defined and unique to each host
> controller. I had been thinking that the base address of the PCI CFG
> space might be a good enough handle.
>
> The only question is whether the data type used for segment id in the
> hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
> not sure if we are going to be able to make this pci_segment_t and have
> it differ for ARM.
>
> Another option might be a new hypercall (assuming one doesn't already
> exist) to register a PCI bus which would take e.g. the PCI CFG base
> address and return a new u16 segment id to be used for all subsequent
> PCI related calls. This would require the dom0 OS to hook its
> pci_bus_add function, which might be doable (more doable than handling
> xen_segment_id DT properties I think).
This seems ok, i will try it out.
>
> I'm not sure if this ends up being different on ACPI, where perhaps
> MMCONFIG or some other table actually gives us a segment ID for each PCI
> bus. Ideally whatever solution we end up with would fit into this model.
>
> Ian.
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 14:11             ` Jan Beulich
@ 2015-02-20 14:26               ` Ian Campbell
  2015-02-20 14:39                 ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 14:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, Stefano Stabellini(Stefano.Stabellini@citrix.com)

On Fri, 2015-02-20 at 14:11 +0000, Jan Beulich wrote:
> >>> On 20.02.15 at 14:45, <ian.campbell@citrix.com> wrote:
> > (Jan, curious if you have any thoughts on this, hopefully I've left
> > sufficient context for you to get what we are on about, the gist is we
> > need some way for dom0 and Xen to agree on how a PCI segment ID maps to
> > an actual PCI root controller, I think on x86 you either Just Know from
> > PC legacy or ACPI tells you?)
> 
> Yeah, without information from ACPI we'd have no way to know how
> to access the config space of segments other than 0. Both I/O port
> based access methods don't have room for specifying a segment
> number. Since the MMCFG addresses get set up by firmware, it is
> also firmware telling us the segment numbers. If you don't get them
> arranged in any particular order, ...
> 
> > On Fri, 2015-02-20 at 18:31 +0530, Manish Jaggi wrote:
> >> I have added ABI that segment no maps to the position on pci node in xen 
> >> device tree. We had partially discussed about this during Linaro 
> >> connect. What is your teams view on this, should this be ok or we 
> >> introduce a property in device tree pci node {xen_segment_id = <1>}
> > 
> > The DT node ordering cannot be relied on this way, so we certainly need
> > something else.
> > 
> > Ideally we should find a solution which doesn't require new properties.
> > 
> > The best solution would be to find some existing property of the PCI
> > host controller which is well defined and unique to each host
> > controller. I had been thinking that the base address of the PCI CFG
> > space might be a good enough handle.
> 
> ... this approach would seem reasonable.
> 
> > The only question is whether the data type used for segment id in the
> > hypercall ABI is wide enough for this, and it seems to be u16 :-(. I'm
> > not sure if we are going to be able to make this pci_segment_t and have
> > it differ for ARM.
> 
> Are you expecting to have more than 64k segments?

If we were to use the PCI CFG base address as the "handle" for a segment
then we would need a 64 bit field is all, it would of course be very
sparse ;-).

> Otherwise,
> just sequentially assign segment numbers as you discover them or
> get them reported by Dom0. You could even have Dom0 tell you
> the segment numbers (just like we do on x86),

Aha, how does this work on x86 then? I've been looking for a hypercall
which dom0 uses to tell Xen about PCI segments to no avail (I just find
ones for registering actual devices).

If there is an existing mechanism on x86 and it suits (or nearly so)
then I am entirely in favour of using it.

Ian.

>  thus eliminating the
> need for an extra mechanism for Dom0 to learn them.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 14:26               ` Ian Campbell
@ 2015-02-20 14:39                 ` Jan Beulich
  2015-02-20 15:01                   ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-20 14:39 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 20.02.15 at 15:26, <ian.campbell@citrix.com> wrote:
> On Fri, 2015-02-20 at 14:11 +0000, Jan Beulich wrote:
>> Otherwise,
>> just sequentially assign segment numbers as you discover them or
>> get them reported by Dom0. You could even have Dom0 tell you
>> the segment numbers (just like we do on x86),
> 
> Aha, how does this work on x86 then? I've been looking for a hypercall
> which dom0 uses to tell Xen about PCI segments to no avail (I just find
> ones for registering actual devices).

But that's the one, plus the MMCFG reporting one
(PHYSDEVOP_pci_mmcfg_reserved). Without ACPI, how do you
know on ARM how to access config space for a particular
segment?

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 14:14             ` Manish Jaggi
@ 2015-02-20 14:39               ` Ian Campbell
  2015-02-23 10:59                 ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 14:39 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Kumar, Vijaya, Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
> > Another option might be a new hypercall (assuming one doesn't already
> > exist) to register a PCI bus which would take e.g. the PCI CFG base
> > address and return a new u16 segment id to be used for all subsequent
> > PCI related calls. This would require the dom0 OS to hook its
> > pci_bus_add function, which might be doable (more doable than handling
> > xen_segment_id DT properties I think).
> This seems ok, i will try it out.

I recommend you let this subthread (e.g. the conversation with Jan)
settle upon a preferred course of action before implementing any one
suggestion.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 14:39                 ` Jan Beulich
@ 2015-02-20 15:01                   ` Ian Campbell
  2015-02-20 15:13                     ` Manish Jaggi
  2015-02-20 15:15                     ` Jan Beulich
  0 siblings, 2 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 15:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Fri, 2015-02-20 at 14:39 +0000, Jan Beulich wrote:
> >>> On 20.02.15 at 15:26, <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-02-20 at 14:11 +0000, Jan Beulich wrote:
> >> Otherwise,
> >> just sequentially assign segment numbers as you discover them or
> >> get them reported by Dom0. You could even have Dom0 tell you
> >> the segment numbers (just like we do on x86),
> > 
> > Aha, how does this work on x86 then? I've been looking for a hypercall
> > which dom0 uses to tell Xen about PCI segments to no avail (I just find
> > ones for registering actual devices).
> 
> But that's the one,

that == physdev_pci_device_add?

AFAICT that tells Xen about a given device existing on a particular
segment, but doesn't tell Xen any of the properties of that segment.

>  plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).

This looks promising, but rather under-documented.

        #define PHYSDEVOP_pci_mmcfg_reserved    24
        struct physdev_pci_mmcfg_reserved {
            uint64_t address;
            uint16_t segment;
            uint8_t start_bus;
            uint8_t end_bus;
            uint32_t flags;
        };
        
I suppose the first 4 fields correspond to entries in the MMCFG table?
Which x86 Xen can parse and so can dom0, so dom0 can then make this
hypercall, passing (address,segment,start_bus,end_bus) to set the flags?

What is address the address of? The CFG space I think?

On ARM with DT I think we only get given address, and something has to
make up segment, start/end_bus I'm not sure where we would get them
from.

So although I think we could perhaps bend this interface to ARMs needs
it would have rather different semantics to x86, i.e. instead of the
"key" being (address,segment,start_bus,end_bus) and the "value" being
flags it would be something like "key" = (address) and "value" =
(segment,start_bus,end_bus,flags).

I don't think reusing like that would be wise.

>  Without ACPI, how do you
> know on ARM how to access config space for a particular
> segment?

That's the issue we are trying to resolve, with device tree there is no
explicit segment ID, so we have an essentially unindexed set of PCI
buses in both Xen and dom0.

So something somewhere needs to make up a segment ID for each PCI bus
and Xen and dom0 need to somehow agree on what the mapping is e.g. by
the one which made up the segment ID telling the other or some other TBD
means.

On x86 you solve this because both Xen and dom0 can parse the same table
and reach the same answer, sadly DT doesn't have everything needed in
it.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 15:01                   ` Ian Campbell
@ 2015-02-20 15:13                     ` Manish Jaggi
  2015-02-20 15:15                       ` Julien Grall
  2015-02-20 15:15                     ` Jan Beulich
  1 sibling, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-20 15:13 UTC (permalink / raw)
  To: Ian Campbell, Jan Beulich
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall,
	StefanoStabellini(Stefano.Stabellini@citrix.com),
	xen-devel


On 20/02/15 8:31 pm, Ian Campbell wrote:
> On Fri, 2015-02-20 at 14:39 +0000, Jan Beulich wrote:
>>>>> On 20.02.15 at 15:26, <ian.campbell@citrix.com> wrote:
>>> On Fri, 2015-02-20 at 14:11 +0000, Jan Beulich wrote:
>>>> Otherwise,
>>>> just sequentially assign segment numbers as you discover them or
>>>> get them reported by Dom0. You could even have Dom0 tell you
>>>> the segment numbers (just like we do on x86),
>>> Aha, how does this work on x86 then? I've been looking for a hypercall
>>> which dom0 uses to tell Xen about PCI segments to no avail (I just find
>>> ones for registering actual devices).
>> But that's the one,
> that == physdev_pci_device_add?
>
> AFAICT that tells Xen about a given device existing on a particular
> segment, but doesn't tell Xen any of the properties of that segment.
>
>>   plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).
> This looks promising, but rather under-documented.
>
>          #define PHYSDEVOP_pci_mmcfg_reserved    24
>          struct physdev_pci_mmcfg_reserved {
>              uint64_t address;
>              uint16_t segment;
>              uint8_t start_bus;
>              uint8_t end_bus;
>              uint32_t flags;
>          };
>          
> I suppose the first 4 fields correspond to entries in the MMCFG table?
> Which x86 Xen can parse and so can dom0, so dom0 can then make this
> hypercall, passing (address,segment,start_bus,end_bus) to set the flags?
>
> What is address the address of? The CFG space I think?
>
> On ARM with DT I think we only get given address, and something has to
> make up segment, start/end_bus I'm not sure where we would get them
> from.
>
> So although I think we could perhaps bend this interface to ARMs needs
> it would have rather different semantics to x86, i.e. instead of the
> "key" being (address,segment,start_bus,end_bus) and the "value" being
> flags it would be something like "key" = (address) and "value" =
> (segment,start_bus,end_bus,flags).
>
> I don't think reusing like that would be wise.
>
>>   Without ACPI, how do you
>> know on ARM how to access config space for a particular
>> segment?
> That's the issue we are trying to resolve, with device tree there is no
> explicit segment ID, so we have an essentially unindexed set of PCI
> buses in both Xen and dom0.
>
> So something somewhere needs to make up a segment ID for each PCI bus
> and Xen and dom0 need to somehow agree on what the mapping is e.g. by
> the one which made up the segment ID telling the other or some other TBD
> means.
>
> On x86 you solve this because both Xen and dom0 can parse the same table
> and reach the same answer, sadly DT doesn't have everything needed in
> it.
In fact xen and dom0 use the same device tree nodes and in the same 
order. xen creates the device tree for dom0.
I think xen can enforce the ABI while creating device tree
>
> Ian.
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 15:01                   ` Ian Campbell
  2015-02-20 15:13                     ` Manish Jaggi
@ 2015-02-20 15:15                     ` Jan Beulich
  2015-02-20 17:33                       ` Ian Campbell
  1 sibling, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-20 15:15 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 20.02.15 at 16:01, <ian.campbell@citrix.com> wrote:
> On Fri, 2015-02-20 at 14:39 +0000, Jan Beulich wrote:
>>  plus the MMCFG reporting one (PHYSDEVOP_pci_mmcfg_reserved).
> 
> This looks promising, but rather under-documented.
> 
>         #define PHYSDEVOP_pci_mmcfg_reserved    24
>         struct physdev_pci_mmcfg_reserved {
>             uint64_t address;
>             uint16_t segment;
>             uint8_t start_bus;
>             uint8_t end_bus;
>             uint32_t flags;
>         };
>         
> I suppose the first 4 fields correspond to entries in the MMCFG table?

Yes.

> Which x86 Xen can parse and so can dom0, so dom0 can then make this
> hypercall, passing (address,segment,start_bus,end_bus) to set the flags?

No, the flags are IN too - since Xen can parse the table itself, there
wouldn't be any need for the hypercall if there weren't many systems
which don't reserve the MMCFG address range(s) in E820 and/or ACPI
resources. Xen can check E820, but obtaining ACPI resource info
requires AML parsing.

> What is address the address of? The CFG space I think?

Yes, the base address of the MMCFG range (maybe suitably offset
by the bus number).

> On ARM with DT I think we only get given address, and something has to
> make up segment, start/end_bus I'm not sure where we would get them
> from.
> 
> So although I think we could perhaps bend this interface to ARMs needs
> it would have rather different semantics to x86, i.e. instead of the
> "key" being (address,segment,start_bus,end_bus) and the "value" being
> flags it would be something like "key" = (address) and "value" =
> (segment,start_bus,end_bus,flags).
> 
> I don't think reusing like that would be wise.

Yes, sufficiently different semantics would call for a new interface.

>>  Without ACPI, how do you
>> know on ARM how to access config space for a particular
>> segment?
> 
> That's the issue we are trying to resolve, with device tree there is no
> explicit segment ID, so we have an essentially unindexed set of PCI
> buses in both Xen and dom0.

How that? What if two bus numbers are equal? There ought to be
some kind of topology information. Or if all buses are distinct, then
you don't need a segment number.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 15:13                     ` Manish Jaggi
@ 2015-02-20 15:15                       ` Julien Grall
  0 siblings, 0 replies; 70+ messages in thread
From: Julien Grall @ 2015-02-20 15:15 UTC (permalink / raw)
  To: Manish Jaggi, Ian Campbell, Jan Beulich
  Cc: Prasun.kapoor, Vijaya Kumar,
	StefanoStabellini(Stefano.Stabellini@citrix.com),
	xen-devel

On 20/02/15 15:13, Manish Jaggi wrote:
>> On x86 you solve this because both Xen and dom0 can parse the same table
>> and reach the same answer, sadly DT doesn't have everything needed in
>> it.
> In fact xen and dom0 use the same device tree nodes and in the same
> order. xen creates the device tree for dom0.
> I think xen can enforce the ABI while creating device tree

While it's true for Linux, you can't assume that another OS (such as
FreeBSD) will parse the device tree in the same order.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 15:15                     ` Jan Beulich
@ 2015-02-20 17:33                       ` Ian Campbell
  2015-02-23  8:43                         ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-20 17:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> > That's the issue we are trying to resolve, with device tree there is no
> > explicit segment ID, so we have an essentially unindexed set of PCI
> > buses in both Xen and dom0.
> 
> How that? What if two bus numbers are equal? There ought to be
> some kind of topology information. Or if all buses are distinct, then
> you don't need a segment number.

It's very possible that I simply don't have the PCI terminology straight
in my head, leading to me talking nonsense.

I'll explain how I'm using it and perhaps you can put me straight...

My understanding was that a PCI segment equates to a PCI host
controller, i.e. a specific instance of some PCI host IP on an SoC. I
think that PCI specs etc perhaps call a segment a PCI domain, which we
avoided in Xen due to the obvious potential for confusion.

A PCI host controller defines the root of a bus, within which the BDF
need not be distinct due to the differing segments which are effectively
a higher level namespace on the BDFs.

So given a system with two PCI host controllers we end up with two
segments (lets say A and B, but choosing those is the topic of this
thread) and it is acceptable for both to contain a bus 0 with a device 1
on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.

It sounds like you are saying that this is not actually acceptable and
that 0:0.0 must be unique in the system irrespective of the associated
segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?

Just for reference a DT node describing a PCI host controller might look
like (taking the APM Mustang one as an example):

                pcie0: pcie@1f2b0000 {
                        status = "disabled";
                        device_type = "pci";
                        compatible = "apm,xgene-storm-pcie", "apm,xgene-pcie";
                        #interrupt-cells = <1>;
                        #size-cells = <2>;
                        #address-cells = <3>;
                        reg = < 0x00 0x1f2b0000 0x0 0x00010000   /* Controller registers */
                                0xe0 0xd0000000 0x0 0x00040000>; /* PCI config space */
                        reg-names = "csr", "cfg";
                        ranges = <0x01000000 0x00 0x00000000 0xe0 0x10000000 0x00 0x00010000   /* io */
                                  0x02000000 0x00 0x80000000 0xe1 0x80000000 0x00 0x80000000>; /* mem */
                        dma-ranges = <0x42000000 0x80 0x00000000 0x80 0x00000000 0x00 0x80000000
                                      0x42000000 0x00 0x00000000 0x00 0x00000000 0x80 0x00000000>;
                        interrupt-map-mask = <0x0 0x0 0x0 0x7>;
                        interrupt-map = <0x0 0x0 0x0 0x1 &gic 0x0 0xc2 0x1
                                         0x0 0x0 0x0 0x2 &gic 0x0 0xc3 0x1
                                         0x0 0x0 0x0 0x3 &gic 0x0 0xc4 0x1
                                         0x0 0x0 0x0 0x4 &gic 0x0 0xc5 0x1>;
                        dma-coherent;
                        clocks = <&pcie0clk 0>;
                };

I expect most of this is uninteresting but the key thing is that there
is no segment number nor topology relative to e.g. "pcie1:
pcie@1f2c0000" (the node look identical except e.g. all the base
addresses and interrupt numbers differ).

(FWIW reg here shows that the PCI cfg space is at 0xe0d0000000,
interrupt-map shows that SPI(AKA GSI) 0xc2 is INTA and 0xc3 is INTB (I
think, a bit fuzzy...), ranges is the space where BARs live, I think you
can safely ignore everything else for the purposes of this
conversation).

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 17:33                       ` Ian Campbell
@ 2015-02-23  8:43                         ` Jan Beulich
  2015-02-23 12:45                           ` Ian Campbell
  2015-03-11 18:26                           ` Stefano Stabellini
  0 siblings, 2 replies; 70+ messages in thread
From: Jan Beulich @ 2015-02-23  8:43 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
> On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
>> > That's the issue we are trying to resolve, with device tree there is no
>> > explicit segment ID, so we have an essentially unindexed set of PCI
>> > buses in both Xen and dom0.
>> 
>> How that? What if two bus numbers are equal? There ought to be
>> some kind of topology information. Or if all buses are distinct, then
>> you don't need a segment number.
> 
> It's very possible that I simply don't have the PCI terminology straight
> in my head, leading to me talking nonsense.
> 
> I'll explain how I'm using it and perhaps you can put me straight...
> 
> My understanding was that a PCI segment equates to a PCI host
> controller, i.e. a specific instance of some PCI host IP on an SoC.

No - there can be multiple roots (i.e. host bridges) on a single
segment. Segments are - afaict - purely a scalability extension
allowing to overcome the 256 bus limit.

> I
> think that PCI specs etc perhaps call a segment a PCI domain, which we
> avoided in Xen due to the obvious potential for confusion.

Right, the two terms are getting mixed depending on where you
look.

> A PCI host controller defines the root of a bus, within which the BDF
> need not be distinct due to the differing segments which are effectively
> a higher level namespace on the BDFs.

The host controller really defines the root of a tree (often covering
multiple buses, i.e. as soon as bridges come into play).

> So given a system with two PCI host controllers we end up with two
> segments (lets say A and B, but choosing those is the topic of this
> thread) and it is acceptable for both to contain a bus 0 with a device 1
> on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
> 
> It sounds like you are saying that this is not actually acceptable and
> that 0:0.0 must be unique in the system irrespective of the associated
> segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?

No, there can be multiple buses numbered zero. And at the same
time a root bus doesn't need to be bus zero on its segment.

> Just for reference a DT node describing a PCI host controller might look
> like (taking the APM Mustang one as an example):
> 
>                 pcie0: pcie@1f2b0000 {
>                         status = "disabled";
>                         device_type = "pci";
>                         compatible = "apm,xgene-storm-pcie", "apm,xgene-pcie";
>                         #interrupt-cells = <1>;
>                         #size-cells = <2>;
>                         #address-cells = <3>;
>                         reg = < 0x00 0x1f2b0000 0x0 0x00010000   /* Controller registers */
>                                 0xe0 0xd0000000 0x0 0x00040000>; /* PCI config space */
>                         reg-names = "csr", "cfg";
>                         ranges = <0x01000000 0x00 0x00000000 0xe0 0x10000000 0x00 0x00010000   /* io */
>                                   0x02000000 0x00 0x80000000 0xe1 0x80000000 0x00 0x80000000>; /* mem */
>                         dma-ranges = <0x42000000 0x80 0x00000000 0x80 0x00000000 0x00 0x80000000
>                                       0x42000000 0x00 0x00000000 0x00 0x00000000 0x80 0x00000000>;
>                         interrupt-map-mask = <0x0 0x0 0x0 0x7>;
>                         interrupt-map = <0x0 0x0 0x0 0x1 &gic 0x0 0xc2 0x1
>                                          0x0 0x0 0x0 0x2 &gic 0x0 0xc3 0x1
>                                          0x0 0x0 0x0 0x3 &gic 0x0 0xc4 0x1
>                                          0x0 0x0 0x0 0x4 &gic 0x0 0xc5 0x1>;
>                         dma-coherent;
>                         clocks = <&pcie0clk 0>;
>                 };
> 
> I expect most of this is uninteresting but the key thing is that there
> is no segment number nor topology relative to e.g. "pcie1:
> pcie@1f2c0000" (the node look identical except e.g. all the base
> addresses and interrupt numbers differ).

What I don't get from this is where the BDF is being represented.
Yet that arrangement is fundamental to understand whether you
really need segments to properly disambiguate devices.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-20 14:39               ` Ian Campbell
@ 2015-02-23 10:59                 ` Manish Jaggi
  2015-02-23 11:14                   ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-23 10:59 UTC (permalink / raw)
  To: xen-devel, Ian Campbell, Julien Grall, Prasun.kapoor,
	Jan Beulich, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com)


On 20/02/15 8:09 pm, Ian Campbell wrote:
> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>> Another option might be a new hypercall (assuming one doesn't already
>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>> address and return a new u16 segment id to be used for all subsequent
>>> PCI related calls. This would require the dom0 OS to hook its
>>> pci_bus_add function, which might be doable (more doable than handling
>>> xen_segment_id DT properties I think).
>> This seems ok, i will try it out.
> I recommend you let this subthread (e.g. the conversation with Jan)
> settle upon a preferred course of action before implementing any one
> suggestion.
Ian we have also to consider for NUMA / multi node where there are two 
or more its nodes.
pci0{

msi-parent = <&its0>;
}

pci1{

msi-parent = <&its1>;
}

This requires parsing pci nodes in xen and create a mapping between pci 
nodes and its. Xe would need to be aware of PCI nodes in device tree 
prior to dom0 sending a hypercall. Adding a property to pci node in 
device tree should be a good approach.
> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 10:59                 ` Manish Jaggi
@ 2015-02-23 11:14                   ` Julien Grall
  2015-02-23 11:50                     ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-23 11:14 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel, Ian Campbell, Prasun.kapoor,
	Jan Beulich, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com)



On 23/02/2015 10:59, Manish Jaggi wrote:
>
> On 20/02/15 8:09 pm, Ian Campbell wrote:
>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>> Another option might be a new hypercall (assuming one doesn't already
>>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>>> address and return a new u16 segment id to be used for all subsequent
>>>> PCI related calls. This would require the dom0 OS to hook its
>>>> pci_bus_add function, which might be doable (more doable than handling
>>>> xen_segment_id DT properties I think).
>>> This seems ok, i will try it out.
>> I recommend you let this subthread (e.g. the conversation with Jan)
>> settle upon a preferred course of action before implementing any one
>> suggestion.
> Ian we have also to consider for NUMA / multi node where there are two
> or more its nodes.
> pci0{
>
> msi-parent = <&its0>;
> }
>
> pci1{
>
> msi-parent = <&its1>;
> }
>
> This requires parsing pci nodes in xen and create a mapping between pci
> nodes and its. Xe would need to be aware of PCI nodes in device tree
> prior to dom0 sending a hypercall. Adding a property to pci node in
> device tree should be a good approach.

Why do you need it early? Wouldn't be sufficient to retrieve those 
information when the hypercall pci_device_add is called?

What about ACPI case? Does everything necessary live in static table?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 11:14                   ` Julien Grall
@ 2015-02-23 11:50                     ` Manish Jaggi
  2015-02-23 15:15                       ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-23 11:50 UTC (permalink / raw)
  To: Julien Grall, xen-devel, Ian Campbell, Prasun.kapoor,
	Jan Beulich, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com)


On 23/02/15 4:44 pm, Julien Grall wrote:
>
>
> On 23/02/2015 10:59, Manish Jaggi wrote:
>>
>> On 20/02/15 8:09 pm, Ian Campbell wrote:
>>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>>> Another option might be a new hypercall (assuming one doesn't already
>>>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>>>> address and return a new u16 segment id to be used for all subsequent
>>>>> PCI related calls. This would require the dom0 OS to hook its
>>>>> pci_bus_add function, which might be doable (more doable than 
>>>>> handling
>>>>> xen_segment_id DT properties I think).
>>>> This seems ok, i will try it out.
>>> I recommend you let this subthread (e.g. the conversation with Jan)
>>> settle upon a preferred course of action before implementing any one
>>> suggestion.
>> Ian we have also to consider for NUMA / multi node where there are two
>> or more its nodes.
>> pci0{
>>
>> msi-parent = <&its0>;
>> }
>>
>> pci1{
>>
>> msi-parent = <&its1>;
>> }
>>
>> This requires parsing pci nodes in xen and create a mapping between pci
>> nodes and its. Xe would need to be aware of PCI nodes in device tree
>> prior to dom0 sending a hypercall. Adding a property to pci node in
>> device tree should be a good approach.
>
> Why do you need it early? Wouldn't be sufficient to retrieve those 
> information when the hypercall pci_device_add is called?
>
The dom0/U device tree should have one 1 its node, xen should map to specific its when trapped.

> What about ACPI case? Does everything necessary live in static table?
>
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23  8:43                         ` Jan Beulich
@ 2015-02-23 12:45                           ` Ian Campbell
  2015-02-23 14:07                             ` Jan Beulich
  2015-03-11 18:26                           ` Stefano Stabellini
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-23 12:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Mon, 2015-02-23 at 08:43 +0000, Jan Beulich wrote:
> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> >> > That's the issue we are trying to resolve, with device tree there is no
> >> > explicit segment ID, so we have an essentially unindexed set of PCI
> >> > buses in both Xen and dom0.
> >> 
> >> How that? What if two bus numbers are equal? There ought to be
> >> some kind of topology information. Or if all buses are distinct, then
> >> you don't need a segment number.
> > 
> > It's very possible that I simply don't have the PCI terminology straight
> > in my head, leading to me talking nonsense.
> > 
> > I'll explain how I'm using it and perhaps you can put me straight...
> > 
> > My understanding was that a PCI segment equates to a PCI host
> > controller, i.e. a specific instance of some PCI host IP on an SoC.
> 
> No - there can be multiple roots (i.e. host bridges)

Where a "host bridge" == what I've been calling "PCI host controller"?

I suppose in principal a single controller might expose multiple host
bridges, but I think we can logically treat such things as being
multiple controllers (e.g. with multiple CFG spaces etc).

>  on a single
> segment. Segments are - afaict - purely a scalability extension
> allowing to overcome the 256 bus limit.

Is the converse true -- i.e. can a single host bridge span multiple
segments? IOW is the mapping from segment->host bridge many->one or
many->many?

Maybe what I should read into what you are saying is that segments are
purely a software and/or firmware concept with no real basis in the
hardware?

In which case might we be at liberty to specify that on ARM+Device Tree
systems (i.e. those where the f/w tables don't give an enumeration)
there is a 1:1 mapping from segments to host bridges?

> > A PCI host controller defines the root of a bus, within which the BDF
> > need not be distinct due to the differing segments which are effectively
> > a higher level namespace on the BDFs.
> 
> The host controller really defines the root of a tree (often covering
> multiple buses, i.e. as soon as bridges come into play).

Right, I think that's the one thing I'd managed to understanding
correctly ;-)

> > So given a system with two PCI host controllers we end up with two
> > segments (lets say A and B, but choosing those is the topic of this
> > thread) and it is acceptable for both to contain a bus 0 with a device 1
> > on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
> > 
> > It sounds like you are saying that this is not actually acceptable and
> > that 0:0.0 must be unique in the system irrespective of the associated
> > segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?
> 
> No, there can be multiple buses numbered zero. And at the same
> time a root bus doesn't need to be bus zero on its segment.

0:0.0 was just an example I pulled out of thin air, it wasn't supposed
to imply some special property of bus 0 e.g. being the root or anything
like that.

If there are multiple buses numbered 0 then are they distinguished via
segment or something else?

> > Just for reference a DT node describing a PCI host controller might look
> > like (taking the APM Mustang one as an example):
> > 
> >                 pcie0: pcie@1f2b0000 {
> >                         status = "disabled";
> >                         device_type = "pci";
> >                         compatible = "apm,xgene-storm-pcie", "apm,xgene-pcie";
> >                         #interrupt-cells = <1>;
> >                         #size-cells = <2>;
> >                         #address-cells = <3>;
> >                         reg = < 0x00 0x1f2b0000 0x0 0x00010000   /* Controller registers */
> >                                 0xe0 0xd0000000 0x0 0x00040000>; /* PCI config space */
> >                         reg-names = "csr", "cfg";
> >                         ranges = <0x01000000 0x00 0x00000000 0xe0 0x10000000 0x00 0x00010000   /* io */
> >                                   0x02000000 0x00 0x80000000 0xe1 0x80000000 0x00 0x80000000>; /* mem */
> >                         dma-ranges = <0x42000000 0x80 0x00000000 0x80 0x00000000 0x00 0x80000000
> >                                       0x42000000 0x00 0x00000000 0x00 0x00000000 0x80 0x00000000>;
> >                         interrupt-map-mask = <0x0 0x0 0x0 0x7>;
> >                         interrupt-map = <0x0 0x0 0x0 0x1 &gic 0x0 0xc2 0x1
> >                                          0x0 0x0 0x0 0x2 &gic 0x0 0xc3 0x1
> >                                          0x0 0x0 0x0 0x3 &gic 0x0 0xc4 0x1
> >                                          0x0 0x0 0x0 0x4 &gic 0x0 0xc5 0x1>;
> >                         dma-coherent;
> >                         clocks = <&pcie0clk 0>;
> >                 };
> > 
> > I expect most of this is uninteresting but the key thing is that there
> > is no segment number nor topology relative to e.g. "pcie1:
> > pcie@1f2c0000" (the node look identical except e.g. all the base
> > addresses and interrupt numbers differ).
> 
> What I don't get from this is where the BDF is being represented.

It isn't, since this is representing the host controller not any given
PCI devices which it contains.

I thought in general BDFs were probed (or even configured) by the
firmware and/or OS by walking over the CFG space and so aren't
necessarily described anywhere in the firmware tables.

FWIW the first 4 bytes in each line of interrupt-map are actually
somehow matched against the masked (via interrupt-map-mask) against an
encoding of the BDF to give the INTx routing, but BDFs aren't
represented in the sense I think you meant in the example above.

There is a capability to have child nodes of this root controller node
which describe individual devices, and there is an encoding for the BDF
in there, but these are not required. For reference I've pasted a DT
snippet from a Nvidia Jetson TK1 (Tegra124) system which has child
nodes. I think the BDF is encoded in assigned-addresses somewhere.

> Yet that arrangement is fundamental to understand whether you
> really need segments to properly disambiguate devices.

Have I clarified enough? I've a feeling not...

Ian.

        pcie-controller@0,01003000 {
                compatible = "nvidia,tegra124-pcie";
                device_type = "pci";
                reg = <0x0 0x01003000 0x0 0x00000800   /* PADS registers */
                       0x0 0x01003800 0x0 0x00000800   /* AFI registers */
                       0x0 0x02000000 0x0 0x10000000>; /* configuration space */
                reg-names = "pads", "afi", "cs";
                interrupts = <GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>, /* controller interrupt */
                             <GIC_SPI 99 IRQ_TYPE_LEVEL_HIGH>; /* MSI interrupt */
                interrupt-names = "intr", "msi";

                #interrupt-cells = <1>;
                interrupt-map-mask = <0 0 0 0>;
                interrupt-map = <0 0 0 0 &gic GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>;

                bus-range = <0x00 0xff>;
                #address-cells = <3>;
                #size-cells = <2>;

                ranges = <0x82000000 0 0x01000000 0x0 0x01000000 0 0x00001000   /* port 0 configuration space */
                          0x82000000 0 0x01001000 0x0 0x01001000 0 0x00001000   /* port 1 configuration space */
                          0x81000000 0 0x0        0x0 0x12000000 0 0x00010000   /* downstream I/O (64 KiB) */
                          0x82000000 0 0x13000000 0x0 0x13000000 0 0x0d000000   /* non-prefetchable memory (208 MiB) */
                          0xc2000000 0 0x20000000 0x0 0x20000000 0 0x20000000>; /* prefetchable memory (512 MiB) */

                clocks = <&tegra_car TEGRA124_CLK_PCIE>,
                         <&tegra_car TEGRA124_CLK_AFI>,
                         <&tegra_car TEGRA124_CLK_PLL_E>,
                         <&tegra_car TEGRA124_CLK_CML0>;
                clock-names = "pex", "afi", "pll_e", "cml";
                resets = <&tegra_car 70>,
                         <&tegra_car 72>,
                         <&tegra_car 74>;
                reset-names = "pex", "afi", "pcie_x";
                status = "disabled";

                phys = <&padctl TEGRA_XUSB_PADCTL_PCIE>;
                phy-names = "pcie";

                pci@1,0 {
                        device_type = "pci";
                        assigned-addresses = <0x82000800 0 0x01000000 0 0x1000>;
                        reg = <0x000800 0 0 0 0>;
                        status = "disabled";

                        #address-cells = <3>;
                        #size-cells = <2>;
                        ranges;

                        nvidia,num-lanes = <2>;
                };

                pci@2,0 {
                        device_type = "pci";
                        assigned-addresses = <0x82001000 0 0x01001000 0 0x1000>;
                        reg = <0x001000 0 0 0 0>;
                        status = "disabled";

                        #address-cells = <3>;
                        #size-cells = <2>;
                        ranges;

                        nvidia,num-lanes = <1>;
                };
        };

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 12:45                           ` Ian Campbell
@ 2015-02-23 14:07                             ` Jan Beulich
  2015-02-23 14:33                               ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-23 14:07 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 23.02.15 at 13:45, <ian.campbell@citrix.com> wrote:
> On Mon, 2015-02-23 at 08:43 +0000, Jan Beulich wrote:
>> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
>> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
>> >> > That's the issue we are trying to resolve, with device tree there is no
>> >> > explicit segment ID, so we have an essentially unindexed set of PCI
>> >> > buses in both Xen and dom0.
>> >> 
>> >> How that? What if two bus numbers are equal? There ought to be
>> >> some kind of topology information. Or if all buses are distinct, then
>> >> you don't need a segment number.
>> > 
>> > It's very possible that I simply don't have the PCI terminology straight
>> > in my head, leading to me talking nonsense.
>> > 
>> > I'll explain how I'm using it and perhaps you can put me straight...
>> > 
>> > My understanding was that a PCI segment equates to a PCI host
>> > controller, i.e. a specific instance of some PCI host IP on an SoC.
>> 
>> No - there can be multiple roots (i.e. host bridges)
> 
> Where a "host bridge" == what I've been calling "PCI host controller"?

Some problems here may originate in this naming: I'm not aware of
anything named "host controller" in PCI. The root of a PCI hierarchy
(single or multiple buses) connects to the system bus via a host
bridge. Whether one or more of them sit in a single chip is of no
interest (on the x86 and ia64 sides at least).

> I suppose in principal a single controller might expose multiple host
> bridges, but I think we can logically treat such things as being
> multiple controllers (e.g. with multiple CFG spaces etc).

Perhaps.

>>  on a single
>> segment. Segments are - afaict - purely a scalability extension
>> allowing to overcome the 256 bus limit.
> 
> Is the converse true -- i.e. can a single host bridge span multiple
> segments?

Not that I'm aware of.

> IOW is the mapping from segment->host bridge many->one or
> many->many?

Each segment may have multiple host bridges, each host bridge
connect devices on multiple buses. Any such hierarchy is entirely
separate from any other such hierarchy (both physically and in
terms of the SBDFs used to identify them).

> Maybe what I should read into what you are saying is that segments are
> purely a software and/or firmware concept with no real basis in the
> hardware?

Right - they just represent separation in hardware, but they have
no real equivalent there.

> In which case might we be at liberty to specify that on ARM+Device Tree
> systems (i.e. those where the f/w tables don't give an enumeration)
> there is a 1:1 mapping from segments to host bridges?

This again can only be answered knowing how bus number
assignment gets done on ARM (see also below). If all bus numbers
are distinct, there's no need for using segments numbers other
then zero. In the end, if you used segment numbers now, you may
end up closing the path to using them for much larger future setups.
But if that's not seen as a problem then yes, I think you could go
that route.

>> > So given a system with two PCI host controllers we end up with two
>> > segments (lets say A and B, but choosing those is the topic of this
>> > thread) and it is acceptable for both to contain a bus 0 with a device 1
>> > on it, i.e. (A:0:0.0) and (B:0:0.0) are distinct and can coexist.
>> > 
>> > It sounds like you are saying that this is not actually acceptable and
>> > that 0:0.0 must be unique in the system irrespective of the associated
>> > segment? iow (B:0:0.0) must be e.g. (B:1:0.0) instead?
>> 
>> No, there can be multiple buses numbered zero. And at the same
>> time a root bus doesn't need to be bus zero on its segment.
> 
> 0:0.0 was just an example I pulled out of thin air, it wasn't supposed
> to imply some special property of bus 0 e.g. being the root or anything
> like that.
> 
> If there are multiple buses numbered 0 then are they distinguished via
> segment or something else?

Just by segment.

>> What I don't get from this is where the BDF is being represented.
> 
> It isn't, since this is representing the host controller not any given
> PCI devices which it contains.
> 
> I thought in general BDFs were probed (or even configured) by the
> firmware and/or OS by walking over the CFG space and so aren't
> necessarily described anywhere in the firmware tables.

They're effectively getting assigned by firmware, yes. This mainly
affects bridges, which get stored (in their config space) the
secondary and subordinate bus numbers (bounding the bus range
they're responsible for when it comes to routing requests). If on
ARM firmware doesn't assign bus numbers, is bridge setup then a
job of the OS?

> FWIW the first 4 bytes in each line of interrupt-map are actually
> somehow matched against the masked (via interrupt-map-mask) against an
> encoding of the BDF to give the INTx routing, but BDFs aren't
> represented in the sense I think you meant in the example above.
> 
> There is a capability to have child nodes of this root controller node
> which describe individual devices, and there is an encoding for the BDF
> in there, but these are not required. For reference I've pasted a DT
> snippet from a Nvidia Jetson TK1 (Tegra124) system which has child
> nodes. I think the BDF is encoded in assigned-addresses somewhere.

Even if the BDF can be derived out of the addresses there it still
doesn't make clear to me how a more complex topology (namely
including bridges) would be represented. As said above - a bridge
needs to know which buses to route requests for, and hence I
can't see how you can get away without using bus numbers at all
in this process; all I can see is that the DT description can get
away without it, by simply representing the hierarchies that on
x86 one would discover by scanning the config space for devices.
I.e. - as indicated above - if bus number assignment is the OSes
job, then you would want to avoid segments as long as you can
get away with the 256 buses you have.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 14:07                             ` Jan Beulich
@ 2015-02-23 14:33                               ` Ian Campbell
  2015-02-23 14:45                                 ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-23 14:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Mon, 2015-02-23 at 14:07 +0000, Jan Beulich wrote:
> >>> On 23.02.15 at 13:45, <ian.campbell@citrix.com> wrote:
> > On Mon, 2015-02-23 at 08:43 +0000, Jan Beulich wrote:
> >> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
> >> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> >> >> > That's the issue we are trying to resolve, with device tree there is no
> >> >> > explicit segment ID, so we have an essentially unindexed set of PCI
> >> >> > buses in both Xen and dom0.
> >> >> 
> >> >> How that? What if two bus numbers are equal? There ought to be
> >> >> some kind of topology information. Or if all buses are distinct, then
> >> >> you don't need a segment number.
> >> > 
> >> > It's very possible that I simply don't have the PCI terminology straight
> >> > in my head, leading to me talking nonsense.
> >> > 
> >> > I'll explain how I'm using it and perhaps you can put me straight...
> >> > 
> >> > My understanding was that a PCI segment equates to a PCI host
> >> > controller, i.e. a specific instance of some PCI host IP on an SoC.
> >> 
> >> No - there can be multiple roots (i.e. host bridges)
> > 
> > Where a "host bridge" == what I've been calling "PCI host controller"?
> 
> Some problems here may originate in this naming: I'm not aware of
> anything named "host controller" in PCI. The root of a PCI hierarchy
> (single or multiple buses) connects to the system bus via a host
> bridge. Whether one or more of them sit in a single chip is of no
> interest (on the x86 and ia64 sides at least).

Yes, I think I've just been using the terminology wrongly, I mean "host
bridge" throughout.

There is generally one such bridge per "controller" (i.e. IP block) whic
his what I was trying to get at in the next paragraph. Lets just talk
about host bridges from now on to avoid confusion.

> > I suppose in principal a single controller might expose multiple host
> > bridges, but I think we can logically treat such things as being
> > multiple controllers (e.g. with multiple CFG spaces etc).
> 
> Perhaps.
> 
> > IOW is the mapping from segment->host bridge many->one or
> > many->many?
> 
> Each segment may have multiple host bridges, each host bridge
> connect devices on multiple buses. Any such hierarchy is entirely
> separate from any other such hierarchy (both physically and in
> terms of the SBDFs used to identify them).
> 
> > Maybe what I should read into what you are saying is that segments are
> > purely a software and/or firmware concept with no real basis in the
> > hardware?
> 
> Right - they just represent separation in hardware, but they have
> no real equivalent there.

I think I now understand.

> > In which case might we be at liberty to specify that on ARM+Device Tree
> > systems (i.e. those where the f/w tables don't give an enumeration)
> > there is a 1:1 mapping from segments to host bridges?
> 
> This again can only be answered knowing how bus number
> assignment gets done on ARM (see also below). If all bus numbers
> are distinct, there's no need for using segments numbers other
> then zero. In the end, if you used segment numbers now, you may
> end up closing the path to using them for much larger future setups.
> But if that's not seen as a problem then yes, I think you could go
> that route.

Ultimately we just need to be able to go from the set of input
parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
bridge.

It seems like the appropriate pair is (segment,bus), which uniquely
corresponds to a single host bridge (and many such pairs may do so).

So I think the original question just goes from having to determine a
way to map a segment to a host bridge to how to map a (segment,bus)
tuple to a host bridge.

> >> What I don't get from this is where the BDF is being represented.
> > 
> > It isn't, since this is representing the host controller not any given
> > PCI devices which it contains.
> > 
> > I thought in general BDFs were probed (or even configured) by the
> > firmware and/or OS by walking over the CFG space and so aren't
> > necessarily described anywhere in the firmware tables.
> 
> They're effectively getting assigned by firmware, yes. This mainly
> affects bridges, which get stored (in their config space) the
> secondary and subordinate bus numbers (bounding the bus range
> they're responsible for when it comes to routing requests). If on
> ARM firmware doesn't assign bus numbers, is bridge setup then a
> job of the OS?

I'm not completely sure I think it depends on the particular firmware
(u-boot, EFI etc) but AIUI it can be the case that the OS does the
enumeration on at least some ARM platforms (quite how/when it knows to
do so I'm not sure).

> > FWIW the first 4 bytes in each line of interrupt-map are actually
> > somehow matched against the masked (via interrupt-map-mask) against an
> > encoding of the BDF to give the INTx routing, but BDFs aren't
> > represented in the sense I think you meant in the example above.
> > 
> > There is a capability to have child nodes of this root controller node
> > which describe individual devices, and there is an encoding for the BDF
> > in there, but these are not required. For reference I've pasted a DT
> > snippet from a Nvidia Jetson TK1 (Tegra124) system which has child
> > nodes. I think the BDF is encoded in assigned-addresses somewhere.
> 
> Even if the BDF can be derived out of the addresses there it still
> doesn't make clear to me how a more complex topology (namely
> including bridges) would be represented. As said above - a bridge
> needs to know which buses to route requests for, and hence I
> can't see how you can get away without using bus numbers at all
> in this process; all I can see is that the DT description can get
> away without it, by simply representing the hierarchies that on
> x86 one would discover by scanning the config space for devices.
> I.e. - as indicated above - if bus number assignment is the OSes
> job, then you would want to avoid segments as long as you can
> get away with the 256 buses you have.

That makes sense, although we are then still left with how to go from
(segment,bus) -> host bridge, even if segment is effectively always 0
for now.

I think we will have to add an interface for dom0 to register new host
bridges with Xen such that Xen can then match against future hypercalls
referring to devices.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 14:33                               ` Ian Campbell
@ 2015-02-23 14:45                                 ` Jan Beulich
  2015-02-23 15:02                                   ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-23 14:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 23.02.15 at 15:33, <ian.campbell@citrix.com> wrote:
> On Mon, 2015-02-23 at 14:07 +0000, Jan Beulich wrote:
>> >>> On 23.02.15 at 13:45, <ian.campbell@citrix.com> wrote:
>> > In which case might we be at liberty to specify that on ARM+Device Tree
>> > systems (i.e. those where the f/w tables don't give an enumeration)
>> > there is a 1:1 mapping from segments to host bridges?
>> 
>> This again can only be answered knowing how bus number
>> assignment gets done on ARM (see also below). If all bus numbers
>> are distinct, there's no need for using segments numbers other
>> then zero. In the end, if you used segment numbers now, you may
>> end up closing the path to using them for much larger future setups.
>> But if that's not seen as a problem then yes, I think you could go
>> that route.
> 
> Ultimately we just need to be able to go from the set of input
> parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
> bridge.
> 
> It seems like the appropriate pair is (segment,bus), which uniquely
> corresponds to a single host bridge (and many such pairs may do so).
> 
> So I think the original question just goes from having to determine a
> way to map a segment to a host bridge to how to map a (segment,bus)
> tuple to a host bridge.

Right. Avoiding (at this point in time) non-zero segments if at all
possible.

>> >> What I don't get from this is where the BDF is being represented.
>> > 
>> > It isn't, since this is representing the host controller not any given
>> > PCI devices which it contains.
>> > 
>> > I thought in general BDFs were probed (or even configured) by the
>> > firmware and/or OS by walking over the CFG space and so aren't
>> > necessarily described anywhere in the firmware tables.
>> 
>> They're effectively getting assigned by firmware, yes. This mainly
>> affects bridges, which get stored (in their config space) the
>> secondary and subordinate bus numbers (bounding the bus range
>> they're responsible for when it comes to routing requests). If on
>> ARM firmware doesn't assign bus numbers, is bridge setup then a
>> job of the OS?
> 
> I'm not completely sure I think it depends on the particular firmware
> (u-boot, EFI etc) but AIUI it can be the case that the OS does the
> enumeration on at least some ARM platforms (quite how/when it knows to
> do so I'm not sure).

In which case the Dom0 OS doing so would need to communicate
its decisions to the hypervisor, as you suggest further down. This
basically replaces the bus scan (on segment 0) that Xen does on
x86 (which topology information gets derived from).

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 14:45                                 ` Jan Beulich
@ 2015-02-23 15:02                                   ` Ian Campbell
  2015-02-23 15:27                                     ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-23 15:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Mon, 2015-02-23 at 14:45 +0000, Jan Beulich wrote:
> >>> On 23.02.15 at 15:33, <ian.campbell@citrix.com> wrote:
> > On Mon, 2015-02-23 at 14:07 +0000, Jan Beulich wrote:
> >> >>> On 23.02.15 at 13:45, <ian.campbell@citrix.com> wrote:
> >> > In which case might we be at liberty to specify that on ARM+Device Tree
> >> > systems (i.e. those where the f/w tables don't give an enumeration)
> >> > there is a 1:1 mapping from segments to host bridges?
> >> 
> >> This again can only be answered knowing how bus number
> >> assignment gets done on ARM (see also below). If all bus numbers
> >> are distinct, there's no need for using segments numbers other
> >> then zero. In the end, if you used segment numbers now, you may
> >> end up closing the path to using them for much larger future setups.
> >> But if that's not seen as a problem then yes, I think you could go
> >> that route.
> > 
> > Ultimately we just need to be able to go from the set of input
> > parameters to e.g. PHYSDEVOP_pci_device_add to the associate host
> > bridge.
> > 
> > It seems like the appropriate pair is (segment,bus), which uniquely
> > corresponds to a single host bridge (and many such pairs may do so).
> > 
> > So I think the original question just goes from having to determine a
> > way to map a segment to a host bridge to how to map a (segment,bus)
> > tuple to a host bridge.
> 
> Right. Avoiding (at this point in time) non-zero segments if at all
> possible.

I think it sounds like we are going to leave that up to the dom0 OS and
whatever it does gets registered with Xen. So non-zero segments is no
longer (directly) up to the Xen code. I think that's fine.

> >> >> What I don't get from this is where the BDF is being represented.
> >> > 
> >> > It isn't, since this is representing the host controller not any given
> >> > PCI devices which it contains.
> >> > 
> >> > I thought in general BDFs were probed (or even configured) by the
> >> > firmware and/or OS by walking over the CFG space and so aren't
> >> > necessarily described anywhere in the firmware tables.
> >> 
> >> They're effectively getting assigned by firmware, yes. This mainly
> >> affects bridges, which get stored (in their config space) the
> >> secondary and subordinate bus numbers (bounding the bus range
> >> they're responsible for when it comes to routing requests). If on
> >> ARM firmware doesn't assign bus numbers, is bridge setup then a
> >> job of the OS?
> > 
> > I'm not completely sure I think it depends on the particular firmware
> > (u-boot, EFI etc) but AIUI it can be the case that the OS does the
> > enumeration on at least some ARM platforms (quite how/when it knows to
> > do so I'm not sure).
> 
> In which case the Dom0 OS doing so would need to communicate
> its decisions to the hypervisor, as you suggest further down.

So more concretely something like:
        #define PHYSDEVOP_pci_host_bridge_add <XX>
        struct physdev_pci_host_bridge_add {
            /* IN */
            uint16_t seg;
            uint8_t bus;
            uint64_t address;
        };
        typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t;
        DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);

Where seg+bus are enumerated/assigned by dom0 and address is some unique
property of the host bridge -- most likely its pci cfg space base
address (which is what physdev_pci_mmcfg_reserved also takes, I think?)

Do you think we would need start_bus + end_bus here? Xen could enumerate
this itself I think, and perhaps should even if dom0 tells us something?

> This
> basically replaces the bus scan (on segment 0) that Xen does on
> x86 (which topology information gets derived from).

Is the reason for the scan being of segment 0 only is that it is the one
which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 

What about other host bridges in segment 0 which aren't at that address?

You could do the others based on MMCFG tables if you wanted, right?

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 11:50                     ` Manish Jaggi
@ 2015-02-23 15:15                       ` Julien Grall
  2015-02-23 17:12                         ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-23 15:15 UTC (permalink / raw)
  To: Manish Jaggi, xen-devel, Ian Campbell, Prasun.kapoor,
	Jan Beulich, Kumar, Vijaya,
	Stefano Stabellini (Stefano.Stabellini@citrix.com)

On 23/02/15 11:50, Manish Jaggi wrote:
> 
> On 23/02/15 4:44 pm, Julien Grall wrote:
>>
>>
>> On 23/02/2015 10:59, Manish Jaggi wrote:
>>>
>>> On 20/02/15 8:09 pm, Ian Campbell wrote:
>>>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>>>> Another option might be a new hypercall (assuming one doesn't already
>>>>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>>>>> address and return a new u16 segment id to be used for all subsequent
>>>>>> PCI related calls. This would require the dom0 OS to hook its
>>>>>> pci_bus_add function, which might be doable (more doable than
>>>>>> handling
>>>>>> xen_segment_id DT properties I think).
>>>>> This seems ok, i will try it out.
>>>> I recommend you let this subthread (e.g. the conversation with Jan)
>>>> settle upon a preferred course of action before implementing any one
>>>> suggestion.
>>> Ian we have also to consider for NUMA / multi node where there are two
>>> or more its nodes.
>>> pci0{
>>>
>>> msi-parent = <&its0>;
>>> }
>>>
>>> pci1{
>>>
>>> msi-parent = <&its1>;
>>> }
>>>
>>> This requires parsing pci nodes in xen and create a mapping between pci
>>> nodes and its. Xe would need to be aware of PCI nodes in device tree
>>> prior to dom0 sending a hypercall. Adding a property to pci node in
>>> device tree should be a good approach.
>>
>> Why do you need it early? Wouldn't be sufficient to retrieve those
>> information when the hypercall pci_device_add is called?
>>
> The dom0/U device tree should have one 1 its node, xen should map to
> specific its when trapped.

The DOM0 device tree should expose the same layout as the hardware. By
exposing only one ITS you make your life more complicate.

PHYSDEVOP_pci_device_add should be called before any initialization is
done. Therefore ITS should be configured for this PCI after Xen is aware
of the PCI.

IHMO, any ITS trap before this is wrong.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 15:02                                   ` Ian Campbell
@ 2015-02-23 15:27                                     ` Jan Beulich
  2015-02-23 15:46                                       ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-23 15:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 23.02.15 at 16:02, <ian.campbell@citrix.com> wrote:
> On Mon, 2015-02-23 at 14:45 +0000, Jan Beulich wrote:
>> In which case the Dom0 OS doing so would need to communicate
>> its decisions to the hypervisor, as you suggest further down.
> 
> So more concretely something like:
>         #define PHYSDEVOP_pci_host_bridge_add <XX>
>         struct physdev_pci_host_bridge_add {
>             /* IN */
>             uint16_t seg;
>             uint8_t bus;
>             uint64_t address;
>         };
>         typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t;
>         DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);
> 
> Where seg+bus are enumerated/assigned by dom0 and address is some unique
> property of the host bridge -- most likely its pci cfg space base
> address (which is what physdev_pci_mmcfg_reserved also takes, I think?)

Right.

> Do you think we would need start_bus + end_bus here? Xen could enumerate
> this itself I think, and perhaps should even if dom0 tells us something?

That depends - if what you get presented here by Dom0 is a PCI
device at <seg>:<bus>:00.0, and if all other setup was already
done on it, then you could read the secondary and subordinate
bus numbers from its config space. If that's not possible, then
Dom0 handing you these values would seem to be necessary.

As a result you may also need a hook from PCI device registration,
allowing to associate it with the right host bridge (and refusing to
add any for which there's none).

As an alternative, extending PHYSDEVOP_manage_pci_add_ext in
a suitable manner may be worth considering, provided (like on x86
and ia64) the host bridges get surfaced as distinct PCI devices.

>> This
>> basically replaces the bus scan (on segment 0) that Xen does on
>> x86 (which topology information gets derived from).
> 
> Is the reason for the scan being of segment 0 only is that it is the one
> which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 

Right - ideally we would scan all segments, but we need Dom0 to
tell us which MMCFG regions are safe to access, and hence can't
do that scan at boot time. But we also won't get away without
scanning, as we need to set up the IOMMU(s) to at least cover
the devices used for booting the system.

> What about other host bridges in segment 0 which aren't at that address?

At which address? (All devices on segment zero are supposed to
be accessible via config space access method 1.)

> You could do the others based on MMCFG tables if you wanted, right?

Yes, with the above mentioned caveat.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 15:27                                     ` Jan Beulich
@ 2015-02-23 15:46                                       ` Ian Campbell
  2015-02-23 16:20                                         ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-23 15:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

On Mon, 2015-02-23 at 15:27 +0000, Jan Beulich wrote:
> >>> On 23.02.15 at 16:02, <ian.campbell@citrix.com> wrote:
> > On Mon, 2015-02-23 at 14:45 +0000, Jan Beulich wrote:
> >> In which case the Dom0 OS doing so would need to communicate
> >> its decisions to the hypervisor, as you suggest further down.
> > 
> > So more concretely something like:
> >         #define PHYSDEVOP_pci_host_bridge_add <XX>
> >         struct physdev_pci_host_bridge_add {
> >             /* IN */
> >             uint16_t seg;
> >             uint8_t bus;
> >             uint64_t address;
> >         };
> >         typedef struct physdev_pci_host_bridge_add physdev_pci_host_bridge_add_t;
> >         DEFINE_XEN_GUEST_HANDLE(physdev_pci_host_bridge_add_t);
> > 
> > Where seg+bus are enumerated/assigned by dom0 and address is some unique
> > property of the host bridge -- most likely its pci cfg space base
> > address (which is what physdev_pci_mmcfg_reserved also takes, I think?)
> 
> Right.
> 
> > Do you think we would need start_bus + end_bus here? Xen could enumerate
> > this itself I think, and perhaps should even if dom0 tells us something?
> 
> That depends - if what you get presented here by Dom0 is a PCI
> device at <seg>:<bus>:00.0, and if all other setup was already
> done on it, then you could read the secondary and subordinate
> bus numbers from its config space. If that's not possible, then
> Dom0 handing you these values would seem to be necessary.
> 
> As a result you may also need a hook from PCI device registration,
> allowing to associate it with the right host bridge (and refusing to
> add any for which there's none).

Right.

My thinking was that PHYSDEVOP_pci_host_bridge_add would add an entry
into some mapping data structure from (segment,bus) to a handle
associated with the associated pci host bridge driver in Xen.

PHYSDEVOP_manage_pci_add would have to lookup the host bridge driver
from the (segment,bus) I think to construct the necessary linkage for
use later when we try to do things to the device, and it should indeed
fail if it can't find one.

> As an alternative, extending PHYSDEVOP_manage_pci_add_ext in
> a suitable manner may be worth considering, provided (like on x86
> and ia64) the host bridges get surfaced as distinct PCI devices.
> 
> >> This
> >> basically replaces the bus scan (on segment 0) that Xen does on
> >> x86 (which topology information gets derived from).
> > 
> > Is the reason for the scan being of segment 0 only is that it is the one
> > which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 
> 
> Right - ideally we would scan all segments, but we need Dom0 to
> tell us which MMCFG regions are safe to access,

Is this done via PHYSDEVOP_pci_mmcfg_reserved?

>  and hence can't
> do that scan at boot time. But we also won't get away without
> scanning, as we need to set up the IOMMU(s) to at least cover
> the devices used for booting the system.

Which hopefully are all segment 0 or aren't needed until after dom0
tells Xen about them I suppose.

> > What about other host bridges in segment 0 which aren't at that address?
> 
> At which address?

I meant this to be a back reference to "the legacy PCI CFG addresses (or
those magic I/O ports)".

>  (All devices on segment zero are supposed to
> be accessible via config space access method 1.)

Is that "the legacy ....  or magic ..." again?

> > You could do the others based on MMCFG tables if you wanted, right?
> 
> Yes, with the above mentioned caveat.
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 15:46                                       ` Ian Campbell
@ 2015-02-23 16:20                                         ` Jan Beulich
  2015-02-26 10:09                                           ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-23 16:20 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 23.02.15 at 16:46, <ian.campbell@citrix.com> wrote:
> On Mon, 2015-02-23 at 15:27 +0000, Jan Beulich wrote:
>> >>> On 23.02.15 at 16:02, <ian.campbell@citrix.com> wrote:
>> > Is the reason for the scan being of segment 0 only is that it is the one
>> > which lives at the legacy PCI CFG addresses (or those magic I/O ports)? 
>> 
>> Right - ideally we would scan all segments, but we need Dom0 to
>> tell us which MMCFG regions are safe to access,
> 
> Is this done via PHYSDEVOP_pci_mmcfg_reserved?

Yes.

>>  and hence can't
>> do that scan at boot time. But we also won't get away without
>> scanning, as we need to set up the IOMMU(s) to at least cover
>> the devices used for booting the system.
> 
> Which hopefully are all segment 0 or aren't needed until after dom0
> tells Xen about them I suppose.

Right. With EFI one may be able to overcome this one day, but the
legacy BIOS doesn't even surface mechanisms (software interrupts)
to access devices outside of segment 0.

>>  (All devices on segment zero are supposed to
>> be accessible via config space access method 1.)
> 
> Is that "the legacy ....  or magic ..." again?

Yes (just that there are two of them).

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 15:15                       ` Julien Grall
@ 2015-02-23 17:12                         ` Manish Jaggi
  2015-02-23 21:39                           ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-23 17:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: prasun.kapoor, Ian Campbell, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich


On 23/02/15 8:45 pm, Julien Grall wrote:
> On 23/02/15 11:50, Manish Jaggi wrote:
>> On 23/02/15 4:44 pm, Julien Grall wrote:
>>>
>>> On 23/02/2015 10:59, Manish Jaggi wrote:
>>>> On 20/02/15 8:09 pm, Ian Campbell wrote:
>>>>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>>>>> Another option might be a new hypercall (assuming one doesn't already
>>>>>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>>>>>> address and return a new u16 segment id to be used for all subsequent
>>>>>>> PCI related calls. This would require the dom0 OS to hook its
>>>>>>> pci_bus_add function, which might be doable (more doable than
>>>>>>> handling
>>>>>>> xen_segment_id DT properties I think).
>>>>>> This seems ok, i will try it out.
>>>>> I recommend you let this subthread (e.g. the conversation with Jan)
>>>>> settle upon a preferred course of action before implementing any one
>>>>> suggestion.
>>>> Ian we have also to consider for NUMA / multi node where there are two
>>>> or more its nodes.
>>>> pci0{
>>>>
>>>> msi-parent = <&its0>;
>>>> }
>>>>
>>>> pci1{
>>>>
>>>> msi-parent = <&its1>;
>>>> }
>>>>
>>>> This requires parsing pci nodes in xen and create a mapping between pci
>>>> nodes and its. Xe would need to be aware of PCI nodes in device tree
>>>> prior to dom0 sending a hypercall. Adding a property to pci node in
>>>> device tree should be a good approach.
>>> Why do you need it early? Wouldn't be sufficient to retrieve those
>>> information when the hypercall pci_device_add is called?
>>>
>> The dom0/U device tree should have one 1 its node, xen should map to
>> specific its when trapped.
> The DOM0 device tree should expose the same layout as the hardware. By
> exposing only one ITS you make your life more complicate.
in what way?
>
> PHYSDEVOP_pci_device_add should be called before any initialization is
> done. Therefore ITS should be configured for this PCI after Xen is aware
> of the PCI.
That is for a device, I believe all devices on a host bridge are 
serviced by a single ITS
>
> IHMO, any ITS trap before this is wrong.
AFAIK guest always sees a virtual ITS, could you please explain what is 
wrong in trapping.
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 17:12                         ` Manish Jaggi
@ 2015-02-23 21:39                           ` Julien Grall
  2015-02-24  0:23                             ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-23 21:39 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: prasun.kapoor, Ian Campbell, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On 23/02/2015 17:12, Manish Jaggi wrote:
>
> On 23/02/15 8:45 pm, Julien Grall wrote:
>> On 23/02/15 11:50, Manish Jaggi wrote:
>>> On 23/02/15 4:44 pm, Julien Grall wrote:
>>>>
>>>> On 23/02/2015 10:59, Manish Jaggi wrote:
>>>>> On 20/02/15 8:09 pm, Ian Campbell wrote:
>>>>>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>>>>>> Another option might be a new hypercall (assuming one doesn't
>>>>>>>> already
>>>>>>>> exist) to register a PCI bus which would take e.g. the PCI CFG base
>>>>>>>> address and return a new u16 segment id to be used for all
>>>>>>>> subsequent
>>>>>>>> PCI related calls. This would require the dom0 OS to hook its
>>>>>>>> pci_bus_add function, which might be doable (more doable than
>>>>>>>> handling
>>>>>>>> xen_segment_id DT properties I think).
>>>>>>> This seems ok, i will try it out.
>>>>>> I recommend you let this subthread (e.g. the conversation with Jan)
>>>>>> settle upon a preferred course of action before implementing any one
>>>>>> suggestion.
>>>>> Ian we have also to consider for NUMA / multi node where there are two
>>>>> or more its nodes.
>>>>> pci0{
>>>>>
>>>>> msi-parent = <&its0>;
>>>>> }
>>>>>
>>>>> pci1{
>>>>>
>>>>> msi-parent = <&its1>;
>>>>> }
>>>>>
>>>>> This requires parsing pci nodes in xen and create a mapping between
>>>>> pci
>>>>> nodes and its. Xe would need to be aware of PCI nodes in device tree
>>>>> prior to dom0 sending a hypercall. Adding a property to pci node in
>>>>> device tree should be a good approach.
>>>> Why do you need it early? Wouldn't be sufficient to retrieve those
>>>> information when the hypercall pci_device_add is called?
>>>>
>>> The dom0/U device tree should have one 1 its node, xen should map to
>>> specific its when trapped.
>> The DOM0 device tree should expose the same layout as the hardware. By
>> exposing only one ITS you make your life more complicate.
> in what way?

Because you have to parse all the device tree to remove the reference to 
the second ITS. It's pointless and can be difficult to do it.

If you are able to emulate on ITS, you can do it for multiple one.

>> PHYSDEVOP_pci_device_add should be called before any initialization is
>> done. Therefore ITS should be configured for this PCI after Xen is aware
>> of the PCI.
> That is for a device, I believe all devices on a host bridge are
> serviced by a single ITS

Why do you speak about host bridge? Do you need to configure the ITS at 
boot time for the host bridge?

Do you have any spec stating there is one ITS per host bridge?

>>
>> IHMO, any ITS trap before this is wrong.
> AFAIK guest always sees a virtual ITS, could you please explain what is
> wrong in trapping.

I never say the trapping is wrong in all case.... The "before" was here 
for any trap before the PCI has been added to Xen is, IHMO, wrong.

Regards,	

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 21:39                           ` Julien Grall
@ 2015-02-24  0:23                             ` Manish Jaggi
  2015-02-24 13:43                               ` Julien Grall
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-24  0:23 UTC (permalink / raw)
  To: Julien Grall
  Cc: prasun.kapoor, Ian Campbell, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich


On 24/02/15 3:09 am, Julien Grall wrote:
> On 23/02/2015 17:12, Manish Jaggi wrote:
>>
>> On 23/02/15 8:45 pm, Julien Grall wrote:
>>> On 23/02/15 11:50, Manish Jaggi wrote:
>>>> On 23/02/15 4:44 pm, Julien Grall wrote:
>>>>>
>>>>> On 23/02/2015 10:59, Manish Jaggi wrote:
>>>>>> On 20/02/15 8:09 pm, Ian Campbell wrote:
>>>>>>> On Fri, 2015-02-20 at 19:44 +0530, Manish Jaggi wrote:
>>>>>>>>> Another option might be a new hypercall (assuming one doesn't
>>>>>>>>> already
>>>>>>>>> exist) to register a PCI bus which would take e.g. the PCI CFG 
>>>>>>>>> base
>>>>>>>>> address and return a new u16 segment id to be used for all
>>>>>>>>> subsequent
>>>>>>>>> PCI related calls. This would require the dom0 OS to hook its
>>>>>>>>> pci_bus_add function, which might be doable (more doable than
>>>>>>>>> handling
>>>>>>>>> xen_segment_id DT properties I think).
>>>>>>>> This seems ok, i will try it out.
>>>>>>> I recommend you let this subthread (e.g. the conversation with Jan)
>>>>>>> settle upon a preferred course of action before implementing any 
>>>>>>> one
>>>>>>> suggestion.
>>>>>> Ian we have also to consider for NUMA / multi node where there 
>>>>>> are two
>>>>>> or more its nodes.
>>>>>> pci0{
>>>>>>
>>>>>> msi-parent = <&its0>;
>>>>>> }
>>>>>>
>>>>>> pci1{
>>>>>>
>>>>>> msi-parent = <&its1>;
>>>>>> }
>>>>>>
>>>>>> This requires parsing pci nodes in xen and create a mapping between
>>>>>> pci
>>>>>> nodes and its. Xe would need to be aware of PCI nodes in device tree
>>>>>> prior to dom0 sending a hypercall. Adding a property to pci node in
>>>>>> device tree should be a good approach.
>>>>> Why do you need it early? Wouldn't be sufficient to retrieve those
>>>>> information when the hypercall pci_device_add is called?
>>>>>
>>>> The dom0/U device tree should have one 1 its node, xen should map to
>>>> specific its when trapped.
>>> The DOM0 device tree should expose the same layout as the hardware. By
>>> exposing only one ITS you make your life more complicate.
>> in what way?
>
> Because you have to parse all the device tree to remove the reference 
> to the second ITS. It's pointless and can be difficult to do it.
>
Could you please describe the case where it is difficult
> If you are able to emulate on ITS, you can do it for multiple one.
keeping it simple and similar across dom0/domUs

Consider a case where a domU is assigned two PCI devices which are 
attached to different nodes. (Node is an entity having its own cores are 
host controllers).

Currently for PCI pass-through xen has a virtual PCI bus in domU. In our 
implementation we attach a msi controller which is gic-v3-its to this 
bus in pci-front. Since there is a single bus you cannot attach 2 msi 
controllers. So xen would have to map the commands from virtual ITS to 
different physical ITS based on the deviceID / collection ID.
>
>>> PHYSDEVOP_pci_device_add should be called before any initialization is
>>> done. Therefore ITS should be configured for this PCI after Xen is 
>>> aware
>>> of the PCI.
>> That is for a device, I believe all devices on a host bridge are
>> serviced by a single ITS
>
> Why do you speak about host bridge? Do you need to configure the ITS 
> at boot time for the host bridge?
>
> Do you have any spec stating there is one ITS per host bridge?
>
I was referring to the mapping between ITS and PCI host bridge and not 
between a PCI end point / device which is added by the said hypercall. 
ITS is per node.It is always after but ITS for the PCI node is known to 
Xen before dom0 is started.
>>>
>>> IHMO, any ITS trap before this is wrong.
>> AFAIK guest always sees a virtual ITS, could you please explain what is
>> wrong in trapping.
>
> I never say the trapping is wrong in all case.... The "before" was 
> here for any trap before the PCI has been added to Xen is, IHMO, wrong.
>
There is no trap before.
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-24  0:23                             ` Manish Jaggi
@ 2015-02-24 13:43                               ` Julien Grall
  2015-02-25  2:33                                 ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-24 13:43 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: prasun.kapoor, Ian Campbell, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On 24/02/15 00:23, Manish Jaggi wrote:
>> Because you have to parse all the device tree to remove the reference
>> to the second ITS. It's pointless and can be difficult to do it.
>>
> Could you please describe the case where it is difficult

You have to parse every node in the device tree and replace the
msi-parent properties with only one ITS.

>> If you are able to emulate on ITS, you can do it for multiple one.
> keeping it simple and similar across dom0/domUs

> Consider a case where a domU is assigned two PCI devices which are
> attached to different nodes. (Node is an entity having its own cores are
> host controllers).

The DOM0 view and guest view of the hardware are different.

In the case of DOM0, we want to expose the same hardware layout as the
host. So if there is 2 ITS then we should expose the 2 ITS.

In the case of the Guest, we (Xen) controls the memory layout. Therefore
we can expose only one ITS.

>>>>
>>>> IHMO, any ITS trap before this is wrong.
>>> AFAIK guest always sees a virtual ITS, could you please explain what is
>>> wrong in trapping.
>>
>> I never say the trapping is wrong in all case.... The "before" was
>> here for any trap before the PCI has been added to Xen is, IHMO, wrong.
>>
> There is no trap before.

So I still don't understand why you need to parse the device tree node
for PCI device at boot time...

If it doesn't trap before, you should not need to know the PCI.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-24 13:43                               ` Julien Grall
@ 2015-02-25  2:33                                 ` Manish Jaggi
  2015-02-25 10:20                                   ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-02-25  2:33 UTC (permalink / raw)
  To: Julien Grall
  Cc: prasun.kapoor, Ian Campbell, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich


On 24/02/15 7:13 pm, Julien Grall wrote:
> On 24/02/15 00:23, Manish Jaggi wrote:
>>> Because you have to parse all the device tree to remove the reference
>>> to the second ITS. It's pointless and can be difficult to do it.
>>>
>> Could you please describe the case where it is difficult
> You have to parse every node in the device tree and replace the
> msi-parent properties with only one ITS.
Thats the idea.
>
>>> If you are able to emulate on ITS, you can do it for multiple one.
>> keeping it simple and similar across dom0/domUs
>> Consider a case where a domU is assigned two PCI devices which are
>> attached to different nodes. (Node is an entity having its own cores are
>> host controllers).
> The DOM0 view and guest view of the hardware are different.
>
> In the case of DOM0, we want to expose the same hardware layout as the
> host. So if there is 2 ITS then we should expose the 2 ITS.
AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are 
handled by xen and a virtualized interface is provided to the guest. So 
if none of SMMU in the layout of host is presented to dom0 the same can 
be valid for multiple ITS.
>
> In the case of the Guest, we (Xen) controls the memory layout.
For Dom0 as well.
> Therefore
> we can expose only one ITS.
If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen 
GIC ITS emulation driver to work.
It should check that request came from a dom0 handle it differently. I 
think this would be *difficult*.
>
>>>>> IHMO, any ITS trap before this is wrong.
>>>> AFAIK guest always sees a virtual ITS, could you please explain what is
>>>> wrong in trapping.
>>> I never say the trapping is wrong in all case.... The "before" was
>>> here for any trap before the PCI has been added to Xen is, IHMO, wrong.
>>>
>> There is no trap before.
> So I still don't understand why you need to parse the device tree node
> for PCI device at boot time...
>
> If it doesn't trap before, you should not need to know the PCI.
>
> Regards,
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-25  2:33                                 ` Manish Jaggi
@ 2015-02-25 10:20                                   ` Ian Campbell
  2015-02-26 10:49                                     ` Vijay Kilari
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-25 10:20 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: prasun.kapoor, Kumar, Vijaya, Julien Grall, xen-devel,
	Stefano Stabellini	(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
> On 24/02/15 7:13 pm, Julien Grall wrote:
> > On 24/02/15 00:23, Manish Jaggi wrote:
> >>> Because you have to parse all the device tree to remove the reference
> >>> to the second ITS. It's pointless and can be difficult to do it.
> >>>
> >> Could you please describe the case where it is difficult
> > You have to parse every node in the device tree and replace the
> > msi-parent properties with only one ITS.
> Thats the idea.
> >
> >>> If you are able to emulate on ITS, you can do it for multiple one.
> >> keeping it simple and similar across dom0/domUs
> >> Consider a case where a domU is assigned two PCI devices which are
> >> attached to different nodes. (Node is an entity having its own cores are
> >> host controllers).
> > The DOM0 view and guest view of the hardware are different.
> >
> > In the case of DOM0, we want to expose the same hardware layout as the
> > host. So if there is 2 ITS then we should expose the 2 ITS.
> AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are 
> handled by xen and a virtualized interface is provided to the guest. So 
> if none of SMMU in the layout of host is presented to dom0 the same can 
> be valid for multiple ITS.

SMMU is one of the things which Xen hides from dom0, for obvious
reasons.

Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
reason for ITS to differ here.

Since dom0 needs to be able to cope with being able to see all of the
host host I/O devices (in the default no-passthrough case), it is
possible, if not likely, that it will need the same amount of ITS
resources (i.e. numbers of LPIs) as the host provides.

> > In the case of the Guest, we (Xen) controls the memory layout.
> For Dom0 as well.

Not true.

For dom0 the memory layout is determined by the host memory layout. The
MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
regions of the host physical address space (often in 1:1, but with
sufficient h/w support this need not be the case).

> > Therefore
> > we can expose only one ITS.
> If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen 
> GIC ITS emulation driver to work.
> It should check that request came from a dom0 handle it differently. I 
> think this would be *difficult*.

I don't think so. If the vITS is written to handle multiple instances
(i.e. in a modular way, as it should be) then it shouldn't matter
whether any given domain has 1 or many vITS. The fact that dom0 may have
one or more and domU only (currently) has one then becomes largely
uninteresting.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23 16:20                                         ` Jan Beulich
@ 2015-02-26 10:09                                           ` Manish Jaggi
  2015-02-26 10:30                                             ` Jan Beulich
  2015-02-26 11:05                                             ` Ian Campbell
  0 siblings, 2 replies; 70+ messages in thread
From: Manish Jaggi @ 2015-02-26 10:09 UTC (permalink / raw)
  To: Jan Beulich, Ian Campbell
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall,
	StefanoStabellini(Stefano.Stabellini@citrix.com),
	xen-devel


On Monday 23 February 2015 09:50 PM, Jan Beulich wrote:
>>>> On 23.02.15 at 16:46, <ian.campbell@citrix.com> wrote:
>> On Mon, 2015-02-23 at 15:27 +0000, Jan Beulich wrote:
>>>>>> On 23.02.15 at 16:02, <ian.campbell@citrix.com> wrote:
>>>> Is the reason for the scan being of segment 0 only is that it is the one
>>>> which lives at the legacy PCI CFG addresses (or those magic I/O ports)?
>>> Right - ideally we would scan all segments, but we need Dom0 to
>>> tell us which MMCFG regions are safe to access,
>> Is this done via PHYSDEVOP_pci_mmcfg_reserved?
> Yes.
>
>>>   and hence can't
>>> do that scan at boot time. But we also won't get away without
>>> scanning, as we need to set up the IOMMU(s) to at least cover
>>> the devices used for booting the system.
>> Which hopefully are all segment 0 or aren't needed until after dom0
>> tells Xen about them I suppose.
> Right. With EFI one may be able to overcome this one day, but the
> legacy BIOS doesn't even surface mechanisms (software interrupts)
> to access devices outside of segment 0.
>
>>>   (All devices on segment zero are supposed to
>>> be accessible via config space access method 1.)
>> Is that "the legacy ....  or magic ..." again?
> Yes (just that there are two of them).
Ian/Jan,
Have you reached a conclusion?
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 10:09                                           ` Manish Jaggi
@ 2015-02-26 10:30                                             ` Jan Beulich
  2015-02-26 11:05                                             ` Ian Campbell
  1 sibling, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2015-02-26 10:30 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 26.02.15 at 11:09, <mjaggi@caviumnetworks.com> wrote:
> On Monday 23 February 2015 09:50 PM, Jan Beulich wrote:
>>>>> On 23.02.15 at 16:46, <ian.campbell@citrix.com> wrote:
>>> On Mon, 2015-02-23 at 15:27 +0000, Jan Beulich wrote:
>>>>>>> On 23.02.15 at 16:02, <ian.campbell@citrix.com> wrote:
>>>>> Is the reason for the scan being of segment 0 only is that it is the one
>>>>> which lives at the legacy PCI CFG addresses (or those magic I/O ports)?
>>>> Right - ideally we would scan all segments, but we need Dom0 to
>>>> tell us which MMCFG regions are safe to access,
>>> Is this done via PHYSDEVOP_pci_mmcfg_reserved?
>> Yes.
>>
>>>>   and hence can't
>>>> do that scan at boot time. But we also won't get away without
>>>> scanning, as we need to set up the IOMMU(s) to at least cover
>>>> the devices used for booting the system.
>>> Which hopefully are all segment 0 or aren't needed until after dom0
>>> tells Xen about them I suppose.
>> Right. With EFI one may be able to overcome this one day, but the
>> legacy BIOS doesn't even surface mechanisms (software interrupts)
>> to access devices outside of segment 0.
>>
>>>>   (All devices on segment zero are supposed to
>>>> be accessible via config space access method 1.)
>>> Is that "the legacy ....  or magic ..." again?
>> Yes (just that there are two of them).
> Ian/Jan,
> Have you reached a conclusion?

Regarding what? The context above - at least to me - doesn't make
clear what you refer to.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-25 10:20                                   ` Ian Campbell
@ 2015-02-26 10:49                                     ` Vijay Kilari
  2015-02-26 11:12                                       ` Ian Campbell
  2015-02-26 14:46                                       ` Pranavkumar Sawargaonkar
  0 siblings, 2 replies; 70+ messages in thread
From: Vijay Kilari @ 2015-02-26 10:49 UTC (permalink / raw)
  To: Ian Campbell
  Cc: prasun.kapoor, Manish Jaggi, Kumar, Vijaya, Julien Grall,
	xen-devel, Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
>> On 24/02/15 7:13 pm, Julien Grall wrote:
>> > On 24/02/15 00:23, Manish Jaggi wrote:
>> >>> Because you have to parse all the device tree to remove the reference
>> >>> to the second ITS. It's pointless and can be difficult to do it.
>> >>>
>> >> Could you please describe the case where it is difficult
>> > You have to parse every node in the device tree and replace the
>> > msi-parent properties with only one ITS.
>> Thats the idea.
>> >
>> >>> If you are able to emulate on ITS, you can do it for multiple one.
>> >> keeping it simple and similar across dom0/domUs
>> >> Consider a case where a domU is assigned two PCI devices which are
>> >> attached to different nodes. (Node is an entity having its own cores are
>> >> host controllers).
>> > The DOM0 view and guest view of the hardware are different.
>> >
>> > In the case of DOM0, we want to expose the same hardware layout as the
>> > host. So if there is 2 ITS then we should expose the 2 ITS.
>> AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
>> handled by xen and a virtualized interface is provided to the guest. So
>> if none of SMMU in the layout of host is presented to dom0 the same can
>> be valid for multiple ITS.
>
> SMMU is one of the things which Xen hides from dom0, for obvious
> reasons.
>
> Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
> reason for ITS to differ here.
>
> Since dom0 needs to be able to cope with being able to see all of the
> host host I/O devices (in the default no-passthrough case), it is
> possible, if not likely, that it will need the same amount of ITS
> resources (i.e. numbers of LPIs) as the host provides.
>
>> > In the case of the Guest, we (Xen) controls the memory layout.
>> For Dom0 as well.
>
> Not true.
>
> For dom0 the memory layout is determined by the host memory layout. The
> MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
> regions of the host physical address space (often in 1:1, but with
> sufficient h/w support this need not be the case).
>
>> > Therefore
>> > we can expose only one ITS.
>> If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
>> GIC ITS emulation driver to work.
>> It should check that request came from a dom0 handle it differently. I
>> think this would be *difficult*.
>
> I don't think so. If the vITS is written to handle multiple instances
> (i.e. in a modular way, as it should be) then it shouldn't matter
> whether any given domain has 1 or many vITS. The fact that dom0 may have
> one or more and domU only (currently) has one then becomes largely
> uninteresting.

I have few queries

1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
command Q is
    mapped to which Physical ITS command Q.
    In case of linux, the ITS node is added as msi chip to pci using
of_pci_msi_chip_add()
    and from pci_dev structure we can know which ITS to use.

    But in case of Xen, when ITS command is trapped we have only
dev_id info from ITS command.

2) If DomU is always given one virtual ITS node. If DomU is assinged
with two different
    PCI devices connected to different physical ITS, then Xen vITS
driver should know how to map
    PCI device to physical ITS

For the two issues above, Xen should have mapping to pci segment and
physical ITS node to use
which can be queried by vITS driver and send command on to correct physical ITS

Vijay

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 10:09                                           ` Manish Jaggi
  2015-02-26 10:30                                             ` Jan Beulich
@ 2015-02-26 11:05                                             ` Ian Campbell
  2015-02-27 14:33                                               ` Stefano Stabellini
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-26 11:05 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Vijaya Kumar, Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
> Have you reached a conclusion?

My current thinking on how PCI for Xen on ARM should look is thus:

xen/arch/arm/pci.c:

        New file, containing core PCI infrastructure for ARM. Includes:
        
        pci_hostbridge_register(), which registers a host bridge:
        
                Registration includes:
                        DT node pointer
                        
                        CFG space address
                
                        pci_hostbridge_ops function table, which
                        contains e.g. cfg space read/write ops, perhaps
                        other stuff).
        
        Function for setting the (segment,bus) for a given host bridge.
        Lets say pci_hostbridge_setup(), the host bridge must have been
        previously registered. Looks up the host bridge via CFG space
        address and maps that to (segment,bus).
        
        Functions for looking up host bridges by various keys as needed
        (cfg base address, DT node, etc)
        
        pci_init() function, called from somewhere appropriate in
        setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
        gic_init() for the shape of this)
        
        Any other common helper functions for managing PCI devices, e.g.
        for implementing PHYSDEVOP_*, which cannot be made properly
        common (i.e. shared with x86).

xen/drivers/pci/host-*.c (or pci/host/*.c):

        New files, one per supported PCI controller IP block. Each
        should use the normal DT_DEVICE infrastructure for probing,
        i.e.:
        DT_DEVICE_START(foo, "FOO", DEVICE_PCIHOST)
        
        Probe function should call pci_hostbridge_register() for each
        host bridge which the controller exposes.
        
xen/arch/arm/physdev.c:

        Implements do_physdev_op handling PHYSDEVOP_*. Includes:
        
        New hypercall subop PHYSDEVOP_pci_host_bridge_add:
        
                As per 1424703761.27930.140.camel@citrix.com> which
                calls pci_hostbridge_setup() to map the (segment,bus) to
                a specific pci_hostbridge_ops (i.e. must have previously
                been registered with pci_hostbridge_register(), else
                error).
        
        PHYSDEVOP_pci_device_add/remove: Implement existing hypercall
        interface used by x86 for ARM.
        
                This requires that PHYSDEVOP_pci_host_bridge_add has
                been called for the (segment,bus) which it refers to,
                otherwise error.
                
                Looks up the host bridge and does whatever setup is
                required plus e.g. calling of pci_add_device().

No doubt various other existing interfaces will need wiring up, e.g.
pci_conf_{read,write}* should lookup the host bridge ops struct and call
the associated method.

I'm sure the above must be incomplete, but I hope the general shape
makes sense?

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 10:49                                     ` Vijay Kilari
@ 2015-02-26 11:12                                       ` Ian Campbell
  2015-02-26 13:58                                         ` Julien Grall
  2015-02-26 14:46                                       ` Pranavkumar Sawargaonkar
  1 sibling, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-26 11:12 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: prasun.kapoor, Manish Jaggi, Kumar, Vijaya, Julien Grall,
	xen-devel, Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On Thu, 2015-02-26 at 16:19 +0530, Vijay Kilari wrote:
> On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
> >> On 24/02/15 7:13 pm, Julien Grall wrote:
> >> > On 24/02/15 00:23, Manish Jaggi wrote:
> >> >>> Because you have to parse all the device tree to remove the reference
> >> >>> to the second ITS. It's pointless and can be difficult to do it.
> >> >>>
> >> >> Could you please describe the case where it is difficult
> >> > You have to parse every node in the device tree and replace the
> >> > msi-parent properties with only one ITS.
> >> Thats the idea.
> >> >
> >> >>> If you are able to emulate on ITS, you can do it for multiple one.
> >> >> keeping it simple and similar across dom0/domUs
> >> >> Consider a case where a domU is assigned two PCI devices which are
> >> >> attached to different nodes. (Node is an entity having its own cores are
> >> >> host controllers).
> >> > The DOM0 view and guest view of the hardware are different.
> >> >
> >> > In the case of DOM0, we want to expose the same hardware layout as the
> >> > host. So if there is 2 ITS then we should expose the 2 ITS.
> >> AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
> >> handled by xen and a virtualized interface is provided to the guest. So
> >> if none of SMMU in the layout of host is presented to dom0 the same can
> >> be valid for multiple ITS.
> >
> > SMMU is one of the things which Xen hides from dom0, for obvious
> > reasons.
> >
> > Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
> > reason for ITS to differ here.
> >
> > Since dom0 needs to be able to cope with being able to see all of the
> > host host I/O devices (in the default no-passthrough case), it is
> > possible, if not likely, that it will need the same amount of ITS
> > resources (i.e. numbers of LPIs) as the host provides.
> >
> >> > In the case of the Guest, we (Xen) controls the memory layout.
> >> For Dom0 as well.
> >
> > Not true.
> >
> > For dom0 the memory layout is determined by the host memory layout. The
> > MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
> > regions of the host physical address space (often in 1:1, but with
> > sufficient h/w support this need not be the case).
> >
> >> > Therefore
> >> > we can expose only one ITS.
> >> If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
> >> GIC ITS emulation driver to work.
> >> It should check that request came from a dom0 handle it differently. I
> >> think this would be *difficult*.
> >
> > I don't think so. If the vITS is written to handle multiple instances
> > (i.e. in a modular way, as it should be) then it shouldn't matter
> > whether any given domain has 1 or many vITS. The fact that dom0 may have
> > one or more and domU only (currently) has one then becomes largely
> > uninteresting.
> 
> I have few queries
> 
> 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
> command Q is
>     mapped to which Physical ITS command Q.
>     In case of linux, the ITS node is added as msi chip to pci using
> of_pci_msi_chip_add()
>     and from pci_dev structure we can know which ITS to use.
> 
>     But in case of Xen, when ITS command is trapped we have only
> dev_id info from ITS command.

With the proper PCI infrastructure in place we can map the vdev_id to a
pdev_id, and from there to our own struct pci_dev 

The mapping from pdev_id to pci_dev is based on the
PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I
described just now in my mail to Manish in this thread (specifically
pci_device_add creates and registers struct pci_dev I think).

> 
> 2) If DomU is always given one virtual ITS node. If DomU is assinged
> with two different
>     PCI devices connected to different physical ITS, then Xen vITS
> driver should know how to map
>     PCI device to physical ITS

Correct, I think that all falls out from the proper tracking of the
vdev_id to pdev_id and from vits to pits for a given domain and the
management/tracking of the struct pci_dev.

Ian.

> For the two issues above, Xen should have mapping to pci segment and
> physical ITS node to use
> which can be queried by vITS driver and send command on to correct physical ITS
> 
> Vijay

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 11:12                                       ` Ian Campbell
@ 2015-02-26 13:58                                         ` Julien Grall
  0 siblings, 0 replies; 70+ messages in thread
From: Julien Grall @ 2015-02-26 13:58 UTC (permalink / raw)
  To: Ian Campbell, Vijay Kilari
  Cc: prasun.kapoor, Manish Jaggi, Kumar, Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

Hi Ian,

On 26/02/15 11:12, Ian Campbell wrote:
>> I have few queries
>>
>> 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
>> command Q is
>>     mapped to which Physical ITS command Q.
>>     In case of linux, the ITS node is added as msi chip to pci using
>> of_pci_msi_chip_add()
>>     and from pci_dev structure we can know which ITS to use.
>>
>>     But in case of Xen, when ITS command is trapped we have only
>> dev_id info from ITS command.
> 
> With the proper PCI infrastructure in place we can map the vdev_id to a
> pdev_id, and from there to our own struct pci_dev 
> 
> The mapping from pdev_id to pci_dev is based on the
> PHYSDEVOP_pci_host_bridge_add and PHYSDEVOP_pci_device_add calls I
> described just now in my mail to Manish in this thread (specifically
> pci_device_add creates and registers struct pci_dev I think).

We may need an hypercall to map the dev_id to a vdev_id.

IIRC, Vijay and Manish was already planned to add one.

> 
>>
>> 2) If DomU is always given one virtual ITS node. If DomU is assinged
>> with two different
>>     PCI devices connected to different physical ITS, then Xen vITS
>> driver should know how to map
>>     PCI device to physical ITS
> 
> Correct, I think that all falls out from the proper tracking of the
> vdev_id to pdev_id and from vits to pits for a given domain and the
> management/tracking of the struct pci_dev.

I think this is the right way to go. Though I haven't read the ITS spec
closely.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 10:49                                     ` Vijay Kilari
  2015-02-26 11:12                                       ` Ian Campbell
@ 2015-02-26 14:46                                       ` Pranavkumar Sawargaonkar
  2015-02-26 15:17                                         ` Julien Grall
  1 sibling, 1 reply; 70+ messages in thread
From: Pranavkumar Sawargaonkar @ 2015-02-26 14:46 UTC (permalink / raw)
  To: Vijay Kilari
  Cc: prasun.kapoor, Ian Campbell, Manish Jaggi, Kumar, Vijaya,
	Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

Hi

On Thu, Feb 26, 2015 at 4:19 PM, Vijay Kilari <vijay.kilari@gmail.com> wrote:
> On Wed, Feb 25, 2015 at 3:50 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
>> On Wed, 2015-02-25 at 08:03 +0530, Manish Jaggi wrote:
>>> On 24/02/15 7:13 pm, Julien Grall wrote:
>>> > On 24/02/15 00:23, Manish Jaggi wrote:
>>> >>> Because you have to parse all the device tree to remove the reference
>>> >>> to the second ITS. It's pointless and can be difficult to do it.
>>> >>>
>>> >> Could you please describe the case where it is difficult
>>> > You have to parse every node in the device tree and replace the
>>> > msi-parent properties with only one ITS.
>>> Thats the idea.
>>> >
>>> >>> If you are able to emulate on ITS, you can do it for multiple one.
>>> >> keeping it simple and similar across dom0/domUs
>>> >> Consider a case where a domU is assigned two PCI devices which are
>>> >> attached to different nodes. (Node is an entity having its own cores are
>>> >> host controllers).
>>> > The DOM0 view and guest view of the hardware are different.
>>> >
>>> > In the case of DOM0, we want to expose the same hardware layout as the
>>> > host. So if there is 2 ITS then we should expose the 2 ITS.
>>> AFAIK Xen has a microkernel design and timer/mmu/smmu/gic/its are
>>> handled by xen and a virtualized interface is provided to the guest. So
>>> if none of SMMU in the layout of host is presented to dom0 the same can
>>> be valid for multiple ITS.
>>
>> SMMU is one of the things which Xen hides from dom0, for obvious
>> reasons.
>>
>> Interrupts are exposed to dom0 in a 1:1 manner. AFAICT there is no
>> reason for ITS to differ here.
>>
>> Since dom0 needs to be able to cope with being able to see all of the
>> host host I/O devices (in the default no-passthrough case), it is
>> possible, if not likely, that it will need the same amount of ITS
>> resources (i.e. numbers of LPIs) as the host provides.
>>
>>> > In the case of the Guest, we (Xen) controls the memory layout.
>>> For Dom0 as well.
>>
>> Not true.
>>
>> For dom0 the memory layout is determined by the host memory layout. The
>> MMIO regions are mapped through 1:1 and the RAM is a subset of the RAM
>> regions of the host physical address space (often in 1:1, but with
>> sufficient h/w support this need not be the case).
>>
>>> > Therefore
>>> > we can expose only one ITS.
>>> If we follow 2 ITS in dom0 and 1 ITS in domU, how do u expect the Xen
>>> GIC ITS emulation driver to work.
>>> It should check that request came from a dom0 handle it differently. I
>>> think this would be *difficult*.
>>
>> I don't think so. If the vITS is written to handle multiple instances
>> (i.e. in a modular way, as it should be) then it shouldn't matter
>> whether any given domain has 1 or many vITS. The fact that dom0 may have
>> one or more and domU only (currently) has one then becomes largely
>> uninteresting.
>
> I have few queries
>
> 1) If Dom0 has 'n' ITS nodes, then how does Xen know which virtual ITS
> command Q is
>     mapped to which Physical ITS command Q.
>     In case of linux, the ITS node is added as msi chip to pci using
> of_pci_msi_chip_add()
>     and from pci_dev structure we can know which ITS to use.
>
>     But in case of Xen, when ITS command is trapped we have only
> dev_id info from ITS command.
>
> 2) If DomU is always given one virtual ITS node. If DomU is assinged
> with two different
>     PCI devices connected to different physical ITS, then Xen vITS
> driver should know how to map
>     PCI device to physical ITS
>
> For the two issues above, Xen should have mapping to pci segment and
> physical ITS node to use
> which can be queried by vITS driver and send command on to correct physical ITS
>

Also if we just show only one vITS (or only one Virtual v2m frame)
instead of two vITS
then actual hardware interrupt number and virtual interrupt number
which guest will see will become different
This will hamper direct irq routing to guest.

-
Pranav



> Vijay
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 14:46                                       ` Pranavkumar Sawargaonkar
@ 2015-02-26 15:17                                         ` Julien Grall
  2015-02-27 10:11                                           ` Pranavkumar Sawargaonkar
  0 siblings, 1 reply; 70+ messages in thread
From: Julien Grall @ 2015-02-26 15:17 UTC (permalink / raw)
  To: Pranavkumar Sawargaonkar, Vijay Kilari
  Cc: prasun.kapoor, Ian Campbell, Manish Jaggi, Kumar, Vijaya,
	xen-devel, Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
> Hi

Hi Pranavkumar,

> Also if we just show only one vITS (or only one Virtual v2m frame)
> instead of two vITS
> then actual hardware interrupt number and virtual interrupt number
> which guest will see will become different
> This will hamper direct irq routing to guest.

The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.

I have a patch which allow virq != pirq:

https://patches.linaro.org/43012/

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 15:17                                         ` Julien Grall
@ 2015-02-27 10:11                                           ` Pranavkumar Sawargaonkar
  2015-02-27 10:38                                             ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Pranavkumar Sawargaonkar @ 2015-02-27 10:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: prasun.kapoor, Ian Campbell, Vijay Kilari, Manish Jaggi, Kumar,
	Vijaya, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

Hi Julien,

On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall <julien.grall@linaro.org> wrote:
> On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
>> Hi
>
> Hi Pranavkumar,
>
>> Also if we just show only one vITS (or only one Virtual v2m frame)
>> instead of two vITS
>> then actual hardware interrupt number and virtual interrupt number
>> which guest will see will become different
>> This will hamper direct irq routing to guest.
>
> The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.

Yes, but in case of GICv2m( I am not sure about ITS) in register
MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
generate an interrupt.
If you write virq which is different that pirq (associated with the
actual GICv2m frame ) then it will not trigger any interrupt.

Now there is case which I am not sure how it can be solvable with one
vITS/vGICv2m  -

. Suppose we have two GICv2m frames and say oneis  having an address
0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
MSI_SETSPI_NS register
. Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
second has 0x80-0x88 associated.
. Now there are two PCIe hosts, first using first GICv2m frame as a
MSI parent and another using second frame.
. Device on first host uses MSI_SETSPI_NS (0x1000) address along with
a data (i.e. intr number say 0x64) and device on second host uses
0x2000 and data 0x80

Now if we show one vGICv2m frame in guest for both the devices then
what address I will program in each device's config space for MSI and
also what will the data value.
Secondly device's write for these addresses will be transparent to cpu
so how can we trap them while device wants to trigger any interrupt ?
Please correct me if I misunderstood anything.

Thanks,
Pranav



>
> I have a patch which allow virq != pirq:
>
> https://patches.linaro.org/43012/
>
> Regards,
>
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 10:11                                           ` Pranavkumar Sawargaonkar
@ 2015-02-27 10:38                                             ` Ian Campbell
  2015-02-27 13:22                                               ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 10:38 UTC (permalink / raw)
  To: Pranavkumar Sawargaonkar
  Cc: prasun.kapoor, Vijay Kilari, Manish Jaggi, Kumar, Vijaya,
	Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote:
> Hi Julien,
> 
> On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall <julien.grall@linaro.org> wrote:
> > On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
> >> Hi
> >
> > Hi Pranavkumar,
> >
> >> Also if we just show only one vITS (or only one Virtual v2m frame)
> >> instead of two vITS
> >> then actual hardware interrupt number and virtual interrupt number
> >> which guest will see will become different
> >> This will hamper direct irq routing to guest.
> >
> > The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.
> 
> Yes, but in case of GICv2m( I am not sure about ITS) in register
> MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
> generate an interrupt.
> If you write virq which is different that pirq (associated with the
> actual GICv2m frame ) then it will not trigger any interrupt.
> 
> Now there is case which I am not sure how it can be solvable with one
> vITS/vGICv2m  -
> 
> . Suppose we have two GICv2m frames and say oneis  having an address
> 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
> MSI_SETSPI_NS register
> . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
> second has 0x80-0x88 associated.
> . Now there are two PCIe hosts, first using first GICv2m frame as a
> MSI parent and another using second frame.
> . Device on first host uses MSI_SETSPI_NS (0x1000) address along with
> a data (i.e. intr number say 0x64) and device on second host uses
> 0x2000 and data 0x80
> 
> Now if we show one vGICv2m frame in guest for both the devices then
> what address I will program in each device's config space for MSI and
> also what will the data value.
> Secondly device's write for these addresses will be transparent to cpu
> so how can we trap them while device wants to trigger any interrupt ?
>
> Please correct me if I misunderstood anything.

Is what you are suggesting a v2m specific issue?

I thought the whole point of the ITS stuff in GICv3 was that one could
program such virt-phys mappings into the hardware ITS and it would do
the translation (the T in ITS) such that the host got the pIRQ it was
expecting when the guest wrote the virtualised vIRQ information to the
device.

Caveat: If I've read the ITS bits of that doc at any point it was long
ago and I've forgotten everything I knew about it... And I've never read
anything about v2m at all ;-)

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 10:38                                             ` Ian Campbell
@ 2015-02-27 13:22                                               ` Ian Campbell
  2015-02-27 13:59                                                 ` Vijay Kilari
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 13:22 UTC (permalink / raw)
  To: Pranavkumar Sawargaonkar
  Cc: prasun.kapoor, Vijay Kilari, Manish Jaggi, Kumar, Vijaya,
	Julien Grall, xen-devel,
	Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 10:38 +0000, Ian Campbell wrote:
> On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote:
> > Hi Julien,
> > 
> > On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall <julien.grall@linaro.org> wrote:
> > > On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
> > >> Hi
> > >
> > > Hi Pranavkumar,
> > >
> > >> Also if we just show only one vITS (or only one Virtual v2m frame)
> > >> instead of two vITS
> > >> then actual hardware interrupt number and virtual interrupt number
> > >> which guest will see will become different
> > >> This will hamper direct irq routing to guest.
> > >
> > > The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.
> > 
> > Yes, but in case of GICv2m( I am not sure about ITS) in register
> > MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
> > generate an interrupt.
> > If you write virq which is different that pirq (associated with the
> > actual GICv2m frame ) then it will not trigger any interrupt.
> > 
> > Now there is case which I am not sure how it can be solvable with one
> > vITS/vGICv2m  -
> > 
> > . Suppose we have two GICv2m frames and say oneis  having an address
> > 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
> > MSI_SETSPI_NS register
> > . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
> > second has 0x80-0x88 associated.
> > . Now there are two PCIe hosts, first using first GICv2m frame as a
> > MSI parent and another using second frame.
> > . Device on first host uses MSI_SETSPI_NS (0x1000) address along with
> > a data (i.e. intr number say 0x64) and device on second host uses
> > 0x2000 and data 0x80
> > 
> > Now if we show one vGICv2m frame in guest for both the devices then
> > what address I will program in each device's config space for MSI and
> > also what will the data value.
> > Secondly device's write for these addresses will be transparent to cpu
> > so how can we trap them while device wants to trigger any interrupt ?
> >
> > Please correct me if I misunderstood anything.
> 
> Is what you are suggesting a v2m specific issue?
> 
> I thought the whole point of the ITS stuff in GICv3 was that one could
> program such virt-phys mappings into the hardware ITS and it would do
> the translation (the T in ITS) such that the host got the pIRQ it was
> expecting when the guest wrote the virtualised vIRQ information to the
> device.

Or perhaps I'm confusing interrupt translation here with the SMMU
translation of the device's DMA writes to the MSI doorbell address?

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 13:22                                               ` Ian Campbell
@ 2015-02-27 13:59                                                 ` Vijay Kilari
  0 siblings, 0 replies; 70+ messages in thread
From: Vijay Kilari @ 2015-02-27 13:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: prasun.kapoor, Manish Jaggi, Kumar, Vijaya, Julien Grall,
	xen-devel, Stefano Stabellini (Stefano.Stabellini@citrix.com),
	Jan Beulich, Pranavkumar Sawargaonkar

On Fri, Feb 27, 2015 at 6:52 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2015-02-27 at 10:38 +0000, Ian Campbell wrote:
>> On Fri, 2015-02-27 at 15:41 +0530, Pranavkumar Sawargaonkar wrote:
>> > Hi Julien,
>> >
>> > On Thu, Feb 26, 2015 at 8:47 PM, Julien Grall <julien.grall@linaro.org> wrote:
>> > > On 26/02/15 14:46, Pranavkumar Sawargaonkar wrote:
>> > >> Hi
>> > >
>> > > Hi Pranavkumar,
>> > >
>> > >> Also if we just show only one vITS (or only one Virtual v2m frame)
>> > >> instead of two vITS
>> > >> then actual hardware interrupt number and virtual interrupt number
>> > >> which guest will see will become different
>> > >> This will hamper direct irq routing to guest.
>> > >
>> > > The IRQ injection should not consider a 1:1 mapping between pIRQ and vIRQ.
>> >
>> > Yes, but in case of GICv2m( I am not sure about ITS) in register
>> > MSI_SETSPI_NS device has to write the interrupt ID (which is pirq) to
>> > generate an interrupt.
>> > If you write virq which is different that pirq (associated with the
>> > actual GICv2m frame ) then it will not trigger any interrupt.

vITS driver can assign pirq for each device as it requires and
programs ITT table
accordingly. The mapping pirq to virq is managed and virq is injected
to respective
domain

>> >
>> > Now there is case which I am not sure how it can be solvable with one
>> > vITS/vGICv2m  -
>> >
>> > . Suppose we have two GICv2m frames and say oneis  having an address
>> > 0x1000 for MSI_SETSPI_NS register and other 0x2000 for it's
>> > MSI_SETSPI_NS register
>> > . Assume first frame has SPI's (physical) 0x64 - 0x72 associated and
>> > second has 0x80-0x88 associated.
>> > . Now there are two PCIe hosts, first using first GICv2m frame as a
>> > MSI parent and another using second frame.
>> > . Device on first host uses MSI_SETSPI_NS (0x1000) address along with
>> > a data (i.e. intr number say 0x64) and device on second host uses
>> > 0x2000 and data 0x80
>> >
>> > Now if we show one vGICv2m frame in guest for both the devices then
>> > what address I will program in each device's config space for MSI and
>> > also what will the data value.
>> > Secondly device's write for these addresses will be transparent to cpu
>> > so how can we trap them while device wants to trigger any interrupt ?
>> >
>> > Please correct me if I misunderstood anything.
>>
>> Is what you are suggesting a v2m specific issue?
>>
>> I thought the whole point of the ITS stuff in GICv3 was that one could
>> program such virt-phys mappings into the hardware ITS and it would do
>> the translation (the T in ITS) such that the host got the pIRQ it was
>> expecting when the guest wrote the virtualised vIRQ information to the
>> device.
>
> Or perhaps I'm confusing interrupt translation here with the SMMU
> translation of the device's DMA writes to the MSI doorbell address?

True,  MSI physical address (GITS_TRANSLATOR register address) is
programmed by Domain when the device is initialized by writing to its
configuration
address space.

The MSIx interrupt is routed through SMMU and hence vITS driver maps
this physical GITS_TRANSLATOR physical address space to every
domain. So that SMMU can translate MSIx write generate by device and there
by raises MSIx interrupt.

So We don't trap when interrupt is triggered by device. it is
translated using SMMU
quietly and interrupt is routed to cpu.

Regards
Vijay

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-26 11:05                                             ` Ian Campbell
@ 2015-02-27 14:33                                               ` Stefano Stabellini
  2015-02-27 14:42                                                 ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Stefano Stabellini @ 2015-02-27 14:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Thu, 26 Feb 2015, Ian Campbell wrote:
> On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
> > Have you reached a conclusion?
> 
> My current thinking on how PCI for Xen on ARM should look is thus:
> 
> xen/arch/arm/pci.c:
> 
>         New file, containing core PCI infrastructure for ARM. Includes:
>         
>         pci_hostbridge_register(), which registers a host bridge:
>         
>                 Registration includes:
>                         DT node pointer
>                         
>                         CFG space address
>                 
>                         pci_hostbridge_ops function table, which
>                         contains e.g. cfg space read/write ops, perhaps
>                         other stuff).
>         
>         Function for setting the (segment,bus) for a given host bridge.
>         Lets say pci_hostbridge_setup(), the host bridge must have been
>         previously registered. Looks up the host bridge via CFG space
>         address and maps that to (segment,bus).
>         
>         Functions for looking up host bridges by various keys as needed
>         (cfg base address, DT node, etc)
>         
>         pci_init() function, called from somewhere appropriate in
>         setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
>         gic_init() for the shape of this)
>         
>         Any other common helper functions for managing PCI devices, e.g.
>         for implementing PHYSDEVOP_*, which cannot be made properly
>         common (i.e. shared with x86).
> 
> xen/drivers/pci/host-*.c (or pci/host/*.c):
> 
>         New files, one per supported PCI controller IP block. Each
>         should use the normal DT_DEVICE infrastructure for probing,
>         i.e.:
>         DT_DEVICE_START(foo, "FOO", DEVICE_PCIHOST)
>         
>         Probe function should call pci_hostbridge_register() for each
>         host bridge which the controller exposes.
>         
> xen/arch/arm/physdev.c:
> 
>         Implements do_physdev_op handling PHYSDEVOP_*. Includes:
>         
>         New hypercall subop PHYSDEVOP_pci_host_bridge_add:
>         
>                 As per 1424703761.27930.140.camel@citrix.com> which
>                 calls pci_hostbridge_setup() to map the (segment,bus) to
>                 a specific pci_hostbridge_ops (i.e. must have previously
>                 been registered with pci_hostbridge_register(), else
>                 error).

I think that the new hypercall is unnecessary. We know the MMCFG address
ranges belonging to a given host bridge from DT and
PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
a specific MMCFG. We don't need anything else: we can simply match the
host bridge based on the MMCFG address that dom0 tells us via
PHYSDEVOP_pci_mmcfg_reserved with the addresses on DT.

But we do need to support PHYSDEVOP_pci_mmcfg_reserved on ARM.


>         PHYSDEVOP_pci_device_add/remove: Implement existing hypercall
>         interface used by x86 for ARM.
>         
>                 This requires that PHYSDEVOP_pci_host_bridge_add has
>                 been called for the (segment,bus) which it refers to,
>                 otherwise error.
>                 
>                 Looks up the host bridge and does whatever setup is
>                 required plus e.g. calling of pci_add_device().
> 
> No doubt various other existing interfaces will need wiring up, e.g.
> pci_conf_{read,write}* should lookup the host bridge ops struct and call
> the associated method.
> 
> I'm sure the above must be incomplete, but I hope the general shape
> makes sense?
 
I think it makes sense and it is along the lines of what I was thinking
too.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 14:33                                               ` Stefano Stabellini
@ 2015-02-27 14:42                                                 ` Ian Campbell
  2015-02-27 14:54                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 14:42 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 14:33 +0000, Stefano Stabellini wrote:
> On Thu, 26 Feb 2015, Ian Campbell wrote:
> > On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
> > > Have you reached a conclusion?
> > 
> > My current thinking on how PCI for Xen on ARM should look is thus:
> > 
> > xen/arch/arm/pci.c:
> > 
> >         New file, containing core PCI infrastructure for ARM. Includes:
> >         
> >         pci_hostbridge_register(), which registers a host bridge:
> >         
> >                 Registration includes:
> >                         DT node pointer
> >                         
> >                         CFG space address
> >                 
> >                         pci_hostbridge_ops function table, which
> >                         contains e.g. cfg space read/write ops, perhaps
> >                         other stuff).
> >         
> >         Function for setting the (segment,bus) for a given host bridge.
> >         Lets say pci_hostbridge_setup(), the host bridge must have been
> >         previously registered. Looks up the host bridge via CFG space
> >         address and maps that to (segment,bus).
> >         
> >         Functions for looking up host bridges by various keys as needed
> >         (cfg base address, DT node, etc)
> >         
> >         pci_init() function, called from somewhere appropriate in
> >         setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
> >         gic_init() for the shape of this)
> >         
> >         Any other common helper functions for managing PCI devices, e.g.
> >         for implementing PHYSDEVOP_*, which cannot be made properly
> >         common (i.e. shared with x86).
> > 
> > xen/drivers/pci/host-*.c (or pci/host/*.c):
> > 
> >         New files, one per supported PCI controller IP block. Each
> >         should use the normal DT_DEVICE infrastructure for probing,
> >         i.e.:
> >         DT_DEVICE_START(foo, "FOO", DEVICE_PCIHOST)
> >         
> >         Probe function should call pci_hostbridge_register() for each
> >         host bridge which the controller exposes.
> >         
> > xen/arch/arm/physdev.c:
> > 
> >         Implements do_physdev_op handling PHYSDEVOP_*. Includes:
> >         
> >         New hypercall subop PHYSDEVOP_pci_host_bridge_add:
> >         
> >                 As per 1424703761.27930.140.camel@citrix.com> which
> >                 calls pci_hostbridge_setup() to map the (segment,bus) to
> >                 a specific pci_hostbridge_ops (i.e. must have previously
> >                 been registered with pci_hostbridge_register(), else
> >                 error).
> 
> I think that the new hypercall is unnecessary. We know the MMCFG address
> ranges belonging to a given host bridge from DT and
> PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
> a specific MMCFG.

My understanding from discussion with Jan was that this is not what this
hypercall does, or at least that this would be an abuse of the existing
interface. See: <54E75D8702000078000621DD@mail.emea.novell.com>

Anyway, what happens for when there is no MMCFG table to drive dom0's
calls to pci_mmcfg_reserved? Or a given host-bridge doesn't have special
flags and so isn't mentioned there.

I think a dedicated hypercall is better.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 14:42                                                 ` Ian Campbell
@ 2015-02-27 14:54                                                   ` Stefano Stabellini
  2015-02-27 15:24                                                     ` Ian Campbell
  0 siblings, 1 reply; 70+ messages in thread
From: Stefano Stabellini @ 2015-02-27 14:54 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich, Stefano Stabellini

On Fri, 27 Feb 2015, Ian Campbell wrote:
> On Fri, 2015-02-27 at 14:33 +0000, Stefano Stabellini wrote:
> > On Thu, 26 Feb 2015, Ian Campbell wrote:
> > > On Thu, 2015-02-26 at 15:39 +0530, Manish Jaggi wrote:
> > > > Have you reached a conclusion?
> > > 
> > > My current thinking on how PCI for Xen on ARM should look is thus:
> > > 
> > > xen/arch/arm/pci.c:
> > > 
> > >         New file, containing core PCI infrastructure for ARM. Includes:
> > >         
> > >         pci_hostbridge_register(), which registers a host bridge:
> > >         
> > >                 Registration includes:
> > >                         DT node pointer
> > >                         
> > >                         CFG space address
> > >                 
> > >                         pci_hostbridge_ops function table, which
> > >                         contains e.g. cfg space read/write ops, perhaps
> > >                         other stuff).
> > >         
> > >         Function for setting the (segment,bus) for a given host bridge.
> > >         Lets say pci_hostbridge_setup(), the host bridge must have been
> > >         previously registered. Looks up the host bridge via CFG space
> > >         address and maps that to (segment,bus).
> > >         
> > >         Functions for looking up host bridges by various keys as needed
> > >         (cfg base address, DT node, etc)
> > >         
> > >         pci_init() function, called from somewhere appropriate in
> > >         setup.c which calls device_init(node, DEVICE_PCIHOST, NULL) (see
> > >         gic_init() for the shape of this)
> > >         
> > >         Any other common helper functions for managing PCI devices, e.g.
> > >         for implementing PHYSDEVOP_*, which cannot be made properly
> > >         common (i.e. shared with x86).
> > > 
> > > xen/drivers/pci/host-*.c (or pci/host/*.c):
> > > 
> > >         New files, one per supported PCI controller IP block. Each
> > >         should use the normal DT_DEVICE infrastructure for probing,
> > >         i.e.:
> > >         DT_DEVICE_START(foo, "FOO", DEVICE_PCIHOST)
> > >         
> > >         Probe function should call pci_hostbridge_register() for each
> > >         host bridge which the controller exposes.
> > >         
> > > xen/arch/arm/physdev.c:
> > > 
> > >         Implements do_physdev_op handling PHYSDEVOP_*. Includes:
> > >         
> > >         New hypercall subop PHYSDEVOP_pci_host_bridge_add:
> > >         
> > >                 As per 1424703761.27930.140.camel@citrix.com> which
> > >                 calls pci_hostbridge_setup() to map the (segment,bus) to
> > >                 a specific pci_hostbridge_ops (i.e. must have previously
> > >                 been registered with pci_hostbridge_register(), else
> > >                 error).
> > 
> > I think that the new hypercall is unnecessary. We know the MMCFG address
> > ranges belonging to a given host bridge from DT and
> > PHYSDEVOP_pci_mmcfg_reserved gives us segment, start_bus and end_bus for
> > a specific MMCFG.
> 
> My understanding from discussion with Jan was that this is not what this
> hypercall does, or at least that this would be an abuse of the existing
> interface. See: <54E75D8702000078000621DD@mail.emea.novell.com>
 
If all the fields in struct physdev_pci_mmcfg_reserved, including flags,
are IN parameters, like Jan wrote in his email, how is it an abuse of the
interface?  It seems to me that we would be using it exactly as
intended.

As the author of this interface, of course I'll trust Jan's judgement on
this.


> Anyway, what happens for when there is no MMCFG table to drive dom0's
> calls to pci_mmcfg_reserved?

MMCFG is a Linux config option, not to be confused with
PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
is relevant.


> Or a given host-bridge doesn't have special
> flags and so isn't mentioned there.

flags could be zero?


> I think a dedicated hypercall is better.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 14:54                                                   ` Stefano Stabellini
@ 2015-02-27 15:24                                                     ` Ian Campbell
  2015-02-27 15:29                                                       ` Ian Campbell
  2015-02-27 16:35                                                       ` Jan Beulich
  0 siblings, 2 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 15:24 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
> > My understanding from discussion with Jan was that this is not what this
> > hypercall does, or at least that this would be an abuse of the existing
> > interface. See: <54E75D8702000078000621DD@mail.emea.novell.com>
>  
> If all the fields in struct physdev_pci_mmcfg_reserved, including flags,
> are IN parameters, like Jan wrote in his email, how is it an abuse of the
> interface?

AIUI they are IN parameters in the sense that they provide the "Key",
while flags is the "Value" in this hypercall.

> It seems to me that we would be using it exactly as intended.
> 
> As the author of this interface, of course I'll trust Jan's judgement on
> this.
> 
> 
> > Anyway, what happens for when there is no MMCFG table to drive dom0's
> > calls to pci_mmcfg_reserved?
> 
> MMCFG is a Linux config option, not to be confused with
> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> is relevant.

My (possibly flawed) understanding was that pci_mmcfg_reserved was
intended to propagate the result of dom0 parsing some firmware table or
other to the hypevisor.

In Linux dom0 we call it walking pci_mmcfg_list, which looking at
arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
over a "struct acpi_table_mcfg" (there also appears to be a bunch of
processor family derived entries, which I guess are "quirks" of some
sort).

> > Or a given host-bridge doesn't have special
> > flags and so isn't mentioned there.
> 
> flags could be zero?

What I meant was: isn't mentioned in the firmware table and so there is
no corresponding call at all.

Ian.

> 
> 
> > I think a dedicated hypercall is better.
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 15:24                                                     ` Ian Campbell
@ 2015-02-27 15:29                                                       ` Ian Campbell
  2015-02-27 16:35                                                       ` Jan Beulich
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 15:29 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 15:24 +0000, Ian Campbell wrote:

> In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> over a "struct acpi_table_mcfg"

BTW according to
http://lists.linaro.org/pipermail/fw-summit/2015-January/000070.html:

MCFG	Signature Reserved (signature == "MCFG")
	== Memory-mapped ConFiGuration space ==
	If the platform supports PCI/PCIe, an MCFG table is required.

Which for ACPI systems at least we can also parse in Xen. The contents
is in "PCI Firmware Specification, Revision 3.0" which I don't have a
copy of handy.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 15:24                                                     ` Ian Campbell
  2015-02-27 15:29                                                       ` Ian Campbell
@ 2015-02-27 16:35                                                       ` Jan Beulich
  2015-02-27 16:50                                                         ` Ian Campbell
  1 sibling, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-02-27 16:35 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
> On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
>> MMCFG is a Linux config option, not to be confused with
>> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
>> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
>> is relevant.
> 
> My (possibly flawed) understanding was that pci_mmcfg_reserved was
> intended to propagate the result of dom0 parsing some firmware table or
> other to the hypevisor.

That's not flawed at all.

> In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> processor family derived entries, which I guess are "quirks" of some
> sort).

Right - this parses ACPI tables (plus applies some knowledge about
certain specific systems/chipsets/CPUs) and verifies that the space
needed for the MMCFG region is properly reserved either in E820 or
in the ACPI specified resources (only if so Linux decides to use
MMCFG and consequently also tells Xen that it may use it).

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 16:35                                                       ` Jan Beulich
@ 2015-02-27 16:50                                                         ` Ian Campbell
  2015-02-27 17:15                                                           ` Stefano Stabellini
  2015-03-17  5:26                                                           ` Manish Jaggi
  0 siblings, 2 replies; 70+ messages in thread
From: Ian Campbell @ 2015-02-27 16:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Stefano Stabellini

On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
> >>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
> >> MMCFG is a Linux config option, not to be confused with
> >> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> >> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> >> is relevant.
> > 
> > My (possibly flawed) understanding was that pci_mmcfg_reserved was
> > intended to propagate the result of dom0 parsing some firmware table or
> > other to the hypevisor.
> 
> That's not flawed at all.

I think that's a first in this thread ;-)

> > In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> > arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> > over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> > processor family derived entries, which I guess are "quirks" of some
> > sort).
> 
> Right - this parses ACPI tables (plus applies some knowledge about
> certain specific systems/chipsets/CPUs) and verifies that the space
> needed for the MMCFG region is properly reserved either in E820 or
> in the ACPI specified resources (only if so Linux decides to use
> MMCFG and consequently also tells Xen that it may use it).

Thanks.

So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
applies as is to Device Tree based ARM devices, including the need for
the PHYSDEVOP_pci_host_bridge_add call.

On ACPI based devices we will have the MCFG table, and things follow
much as for x86:

      * Xen should parse MCFG to discover the PCI host-bridges
      * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
        the same way as Xen/x86 does.

The SBSA, an ARM standard for "servers", mandates various things which
we can rely on here because ACPI on ARM requires an SBSA compliant
system. So things like odd quirks in PCI controllers or magic setup are
spec'd out of our zone of caring (into the firmware I suppose), hence
there is nothing like the DT_DEVICE_START stuff to register specific
drivers etc.

The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
systems (any more than it is on x86). We can decide whether to omit it
from dom0 or ignore it from Xen later on.

(Manish, this is FYI, I don't expect you to implement ACPI support!)

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 16:50                                                         ` Ian Campbell
@ 2015-02-27 17:15                                                           ` Stefano Stabellini
  2015-03-02 11:48                                                             ` Ian Campbell
  2015-03-17  5:26                                                           ` Manish Jaggi
  1 sibling, 1 reply; 70+ messages in thread
From: Stefano Stabellini @ 2015-02-27 17:15 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich, Stefano Stabellini

On Fri, 27 Feb 2015, Ian Campbell wrote:
> On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
> > >>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
> > > On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
> > >> MMCFG is a Linux config option, not to be confused with
> > >> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> > >> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> > >> is relevant.
> > > 
> > > My (possibly flawed) understanding was that pci_mmcfg_reserved was
> > > intended to propagate the result of dom0 parsing some firmware table or
> > > other to the hypevisor.
> > 
> > That's not flawed at all.
> 
> I think that's a first in this thread ;-)
> 
> > > In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> > > arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> > > over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> > > processor family derived entries, which I guess are "quirks" of some
> > > sort).
> > 
> > Right - this parses ACPI tables (plus applies some knowledge about
> > certain specific systems/chipsets/CPUs) and verifies that the space
> > needed for the MMCFG region is properly reserved either in E820 or
> > in the ACPI specified resources (only if so Linux decides to use
> > MMCFG and consequently also tells Xen that it may use it).
> 
> Thanks.
> 
> So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
> applies as is to Device Tree based ARM devices, including the need for
> the PHYSDEVOP_pci_host_bridge_add call.

Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
intendend for passing down firmware information to Xen, as the
information that we need is exactly the same, I think it would be
acceptable to use the same hypercall on ARM too.

I am not hard set on this and the new hypercall is also a viable option.
However If we do introduce a new hypercall as Ian suggested, do we need
to take into account the possibility that an host bridge might have
multiple cfg memory ranges?


> On ACPI based devices we will have the MCFG table, and things follow
> much as for x86:
> 
>       * Xen should parse MCFG to discover the PCI host-bridges
>       * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
>         the same way as Xen/x86 does.
> 
> The SBSA, an ARM standard for "servers", mandates various things which
> we can rely on here because ACPI on ARM requires an SBSA compliant
> system. So things like odd quirks in PCI controllers or magic setup are
> spec'd out of our zone of caring (into the firmware I suppose), hence
> there is nothing like the DT_DEVICE_START stuff to register specific
> drivers etc.
> 
> The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
> systems (any more than it is on x86). We can decide whether to omit it
> from dom0 or ignore it from Xen later on.
> 
> (Manish, this is FYI, I don't expect you to implement ACPI support!)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 17:15                                                           ` Stefano Stabellini
@ 2015-03-02 11:48                                                             ` Ian Campbell
  2015-03-03  9:19                                                               ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Ian Campbell @ 2015-03-02 11:48 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Fri, 2015-02-27 at 17:15 +0000, Stefano Stabellini wrote:
> On Fri, 27 Feb 2015, Ian Campbell wrote:
> > On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
> > > >>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
> > > > On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
> > > >> MMCFG is a Linux config option, not to be confused with
> > > >> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> > > >> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> > > >> is relevant.
> > > > 
> > > > My (possibly flawed) understanding was that pci_mmcfg_reserved was
> > > > intended to propagate the result of dom0 parsing some firmware table or
> > > > other to the hypevisor.
> > > 
> > > That's not flawed at all.
> > 
> > I think that's a first in this thread ;-)
> > 
> > > > In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> > > > arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> > > > over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> > > > processor family derived entries, which I guess are "quirks" of some
> > > > sort).
> > > 
> > > Right - this parses ACPI tables (plus applies some knowledge about
> > > certain specific systems/chipsets/CPUs) and verifies that the space
> > > needed for the MMCFG region is properly reserved either in E820 or
> > > in the ACPI specified resources (only if so Linux decides to use
> > > MMCFG and consequently also tells Xen that it may use it).
> > 
> > Thanks.
> > 
> > So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
> > applies as is to Device Tree based ARM devices, including the need for
> > the PHYSDEVOP_pci_host_bridge_add call.
> 
> Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
> intendend for passing down firmware information to Xen, as the
> information that we need is exactly the same, I think it would be
> acceptable to use the same hypercall on ARM too.

I strongly disagree, they have very different semantics and overloading
the existing interface would be both wrong and confusing.

It'll also make things harder in the ACPI case where we want to use the
existing hypercall for its original purpose (to propagate MCFG
information to Xen).

> I am not hard set on this and the new hypercall is also a viable option.
> However If we do introduce a new hypercall as Ian suggested, do we need
> to take into account the possibility that an host bridge might have
> multiple cfg memory ranges?

I don't believe so, a host bridge has a 1:1 relationship with CFG space.

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-02 11:48                                                             ` Ian Campbell
@ 2015-03-03  9:19                                                               ` Manish Jaggi
  0 siblings, 0 replies; 70+ messages in thread
From: Manish Jaggi @ 2015-03-03  9:19 UTC (permalink / raw)
  To: Ian Campbell
  Cc: prasun.kapoor, Stefano Stabellini, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich


On Monday 02 March 2015 05:18 PM, Ian Campbell wrote:
> On Fri, 2015-02-27 at 17:15 +0000, Stefano Stabellini wrote:
>> On Fri, 27 Feb 2015, Ian Campbell wrote:
>>> On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
>>>>>>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
>>>>> On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
>>>>>> MMCFG is a Linux config option, not to be confused with
>>>>>> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
>>>>>> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
>>>>>> is relevant.
>>>>> My (possibly flawed) understanding was that pci_mmcfg_reserved was
>>>>> intended to propagate the result of dom0 parsing some firmware table or
>>>>> other to the hypevisor.
>>>> That's not flawed at all.
>>> I think that's a first in this thread ;-)
>>>
>>>>> In Linux dom0 we call it walking pci_mmcfg_list, which looking at
>>>>> arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
>>>>> over a "struct acpi_table_mcfg" (there also appears to be a bunch of
>>>>> processor family derived entries, which I guess are "quirks" of some
>>>>> sort).
>>>> Right - this parses ACPI tables (plus applies some knowledge about
>>>> certain specific systems/chipsets/CPUs) and verifies that the space
>>>> needed for the MMCFG region is properly reserved either in E820 or
>>>> in the ACPI specified resources (only if so Linux decides to use
>>>> MMCFG and consequently also tells Xen that it may use it).
>>> Thanks.
>>>
>>> So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
>>> applies as is to Device Tree based ARM devices, including the need for
>>> the PHYSDEVOP_pci_host_bridge_add call.
>> Although I understand now that PHYSDEVOP_pci_mmcfg_reserved was
>> intendend for passing down firmware information to Xen, as the
>> information that we need is exactly the same, I think it would be
>> acceptable to use the same hypercall on ARM too.
> I strongly disagree, they have very different semantics and overloading
> the existing interface would be both wrong and confusing.
>
> It'll also make things harder in the ACPI case where we want to use the
> existing hypercall for its original purpose (to propagate MCFG
> information to Xen).
>
>> I am not hard set on this and the new hypercall is also a viable option.
>> However If we do introduce a new hypercall as Ian suggested, do we need
>> to take into account the possibility that an host bridge might have
>> multiple cfg memory ranges?
> I don't believe so, a host bridge has a 1:1 relationship with CFG space.
>
> Ian.
>
I agree with this flow. It fits into what we have implemented at our end.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-23  8:43                         ` Jan Beulich
  2015-02-23 12:45                           ` Ian Campbell
@ 2015-03-11 18:26                           ` Stefano Stabellini
  2015-03-12  9:16                             ` Jan Beulich
  2015-03-12  9:30                             ` Ian Campbell
  1 sibling, 2 replies; 70+ messages in thread
From: Stefano Stabellini @ 2015-03-11 18:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Manish Jaggi, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com)

On Mon, 23 Feb 2015, Jan Beulich wrote:
> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> >> > That's the issue we are trying to resolve, with device tree there is no
> >> > explicit segment ID, so we have an essentially unindexed set of PCI
> >> > buses in both Xen and dom0.
> >> 
> >> How that? What if two bus numbers are equal? There ought to be
> >> some kind of topology information. Or if all buses are distinct, then
> >> you don't need a segment number.
> > 
> > It's very possible that I simply don't have the PCI terminology straight
> > in my head, leading to me talking nonsense.
> > 
> > I'll explain how I'm using it and perhaps you can put me straight...
> > 
> > My understanding was that a PCI segment equates to a PCI host
> > controller, i.e. a specific instance of some PCI host IP on an SoC.
> 
> No - there can be multiple roots (i.e. host bridges) on a single
> segment. Segments are - afaict - purely a scalability extension
> allowing to overcome the 256 bus limit.

Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
stated:

"The MCFG table format allows for more than one memory mapped base
address entry provided each entry (memory mapped configuration space
base address allocation structure) corresponds to a unique PCI Segment
Group consisting of 256 PCI buses. Multiple entries corresponding to a
single PCI Segment Group is not allowed."

I think that it is reasonable to expect device tree systems to respect this
too. 

On ACPI systems the MCFG contains all the info we need, however on
device tree systems the segment number is missing from the pcie node, so
we still need to find a way to agree with Dom0 on which host bridge
corresponds to which segnment.

In other words I think that we still need PHYSDEVOP_pci_host_bridge_add
(http://marc.info/?l=xen-devel&m=142470392016381) or equivalent, but
we can drop the bus field from the struct.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-11 18:26                           ` Stefano Stabellini
@ 2015-03-12  9:16                             ` Jan Beulich
  2015-03-12 10:33                               ` Stefano Stabellini
  2015-03-12  9:30                             ` Ian Campbell
  1 sibling, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-03-12  9:16 UTC (permalink / raw)
  To: Stefano.Stabellini, Stefano Stabellini
  Cc: Prasun.kapoor, Ian Campbell, ManishJaggi, Vijaya Kumar,
	Julien Grall, xen-devel

>>> On 11.03.15 at 19:26, <stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 23 Feb 2015, Jan Beulich wrote:
>> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
>> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
>> >> > That's the issue we are trying to resolve, with device tree there is no
>> >> > explicit segment ID, so we have an essentially unindexed set of PCI
>> >> > buses in both Xen and dom0.
>> >> 
>> >> How that? What if two bus numbers are equal? There ought to be
>> >> some kind of topology information. Or if all buses are distinct, then
>> >> you don't need a segment number.
>> > 
>> > It's very possible that I simply don't have the PCI terminology straight
>> > in my head, leading to me talking nonsense.
>> > 
>> > I'll explain how I'm using it and perhaps you can put me straight...
>> > 
>> > My understanding was that a PCI segment equates to a PCI host
>> > controller, i.e. a specific instance of some PCI host IP on an SoC.
>> 
>> No - there can be multiple roots (i.e. host bridges) on a single
>> segment. Segments are - afaict - purely a scalability extension
>> allowing to overcome the 256 bus limit.
> 
> Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
> stated:
> 
> "The MCFG table format allows for more than one memory mapped base
> address entry provided each entry (memory mapped configuration space
> base address allocation structure) corresponds to a unique PCI Segment
> Group consisting of 256 PCI buses. Multiple entries corresponding to a
> single PCI Segment Group is not allowed."

For one, what you quote is in no contradiction to what I said. All it
specifies is that there shouldn't be multiple MCFG table entries
specifying the same segment. Whether on any such segment there
is a single host bridge or multiple of them is of no interest here.

And then the present x86 Linux code suggests that there might be
systems actually violating this (the fact that each entry names not
only a segment, but also a bus range also kind of suggests this
despite the wording above); see commit 068258bc15 and its
neighbors - even if it talks about the address ranges coming from
other than ACPI tables, firmware wanting to express them by ACPI
tables would have to violate that rule.

> I think that it is reasonable to expect device tree systems to respect this
> too. 

Not really - as soon as we leave ACPI land, we're free to arrange
things however they suit us best (of course in agreement with other
components involved, like Dom0 in this case), and for that case the
cited Linux commit is a proper reference that it can (and has been)
done differently by system designers.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-11 18:26                           ` Stefano Stabellini
  2015-03-12  9:16                             ` Jan Beulich
@ 2015-03-12  9:30                             ` Ian Campbell
  1 sibling, 0 replies; 70+ messages in thread
From: Ian Campbell @ 2015-03-12  9:30 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Prasun.kapoor, Manish Jaggi, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Wed, 2015-03-11 at 18:26 +0000, Stefano Stabellini wrote:
> In other words I think that we still need PHYSDEVOP_pci_host_bridge_add
> (http://marc.info/?l=xen-devel&m=142470392016381) or equivalent, but
> we can drop the bus field from the struct.

I think it makes sense for the struct to contain a similar set of
entries to the MCFG ones, which would give us flexibility in the future
if a) our interpretation of the specs is wrong or b) new specs come
along which say something different (or Linux changes what it does
internally).

IOW I think segment+bus start+bus end is probably the way to go, even if
we think bus will be unused today (which equates to it always being 0).

Ian.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-12  9:16                             ` Jan Beulich
@ 2015-03-12 10:33                               ` Stefano Stabellini
  2015-03-12 11:28                                 ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Stefano Stabellini @ 2015-03-12 10:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Julien Grall, Ian Campbell, ManishJaggi,
	Vijaya Kumar, Stefano Stabellini, xen-devel, Stefano.Stabellini

On Thu, 12 Mar 2015, Jan Beulich wrote:
> >>> On 11.03.15 at 19:26, <stefano.stabellini@eu.citrix.com> wrote:
> > On Mon, 23 Feb 2015, Jan Beulich wrote:
> >> >>> On 20.02.15 at 18:33, <ian.campbell@citrix.com> wrote:
> >> > On Fri, 2015-02-20 at 15:15 +0000, Jan Beulich wrote:
> >> >> > That's the issue we are trying to resolve, with device tree there is no
> >> >> > explicit segment ID, so we have an essentially unindexed set of PCI
> >> >> > buses in both Xen and dom0.
> >> >> 
> >> >> How that? What if two bus numbers are equal? There ought to be
> >> >> some kind of topology information. Or if all buses are distinct, then
> >> >> you don't need a segment number.
> >> > 
> >> > It's very possible that I simply don't have the PCI terminology straight
> >> > in my head, leading to me talking nonsense.
> >> > 
> >> > I'll explain how I'm using it and perhaps you can put me straight...
> >> > 
> >> > My understanding was that a PCI segment equates to a PCI host
> >> > controller, i.e. a specific instance of some PCI host IP on an SoC.
> >> 
> >> No - there can be multiple roots (i.e. host bridges) on a single
> >> segment. Segments are - afaict - purely a scalability extension
> >> allowing to overcome the 256 bus limit.
> > 
> > Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
> > stated:
> > 
> > "The MCFG table format allows for more than one memory mapped base
> > address entry provided each entry (memory mapped configuration space
> > base address allocation structure) corresponds to a unique PCI Segment
> > Group consisting of 256 PCI buses. Multiple entries corresponding to a
> > single PCI Segment Group is not allowed."
> 
> For one, what you quote is in no contradiction to what I said. All it
> specifies is that there shouldn't be multiple MCFG table entries
> specifying the same segment. Whether on any such segment there
> is a single host bridge or multiple of them is of no interest here.

I thought that we had already established that one host bridge
corresponds to one PCI config memory region, see the last sentence in
http://marc.info/?l=xen-devel&m=142529695117142.  Did I misunderstand
it?  If a host bridge has a 1:1 relationship with CFG space, then each
MCFG entry would correspond to one host bridge and one segment.

But it looks like that things are more complicated than that.


> And then the present x86 Linux code suggests that there might be
> systems actually violating this (the fact that each entry names not
> only a segment, but also a bus range also kind of suggests this
> despite the wording above); see commit 068258bc15 and its
> neighbors - even if it talks about the address ranges coming from
> other than ACPI tables, firmware wanting to express them by ACPI
> tables would have to violate that rule.

Interesting.


> > I think that it is reasonable to expect device tree systems to respect this
> > too. 
> 
> Not really - as soon as we leave ACPI land, we're free to arrange
> things however they suit us best (of course in agreement with other
> components involved, like Dom0 in this case), and for that case the
> cited Linux commit is a proper reference that it can (and has been)
> done differently by system designers.

OK. It looks like everything should work OK with the hypercall proposed
by Ian anyway.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-12 10:33                               ` Stefano Stabellini
@ 2015-03-12 11:28                                 ` Jan Beulich
  0 siblings, 0 replies; 70+ messages in thread
From: Jan Beulich @ 2015-03-12 11:28 UTC (permalink / raw)
  To: Stefano.Stabellini, Stefano Stabellini
  Cc: Prasun.kapoor, Ian Campbell, ManishJaggi, Vijaya Kumar,
	Julien Grall, xen-devel

>>> On 12.03.15 at 11:33, <stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 12 Mar 2015, Jan Beulich wrote:
>> >>> On 11.03.15 at 19:26, <stefano.stabellini@eu.citrix.com> wrote:
>> > On Mon, 23 Feb 2015, Jan Beulich wrote:
>> >> No - there can be multiple roots (i.e. host bridges) on a single
>> >> segment. Segments are - afaict - purely a scalability extension
>> >> allowing to overcome the 256 bus limit.
>> > 
>> > Actually this turns out to be wrong. On the PCI MCFG spec it is clearly
>> > stated:
>> > 
>> > "The MCFG table format allows for more than one memory mapped base
>> > address entry provided each entry (memory mapped configuration space
>> > base address allocation structure) corresponds to a unique PCI Segment
>> > Group consisting of 256 PCI buses. Multiple entries corresponding to a
>> > single PCI Segment Group is not allowed."
>> 
>> For one, what you quote is in no contradiction to what I said. All it
>> specifies is that there shouldn't be multiple MCFG table entries
>> specifying the same segment. Whether on any such segment there
>> is a single host bridge or multiple of them is of no interest here.
> 
> I thought that we had already established that one host bridge
> corresponds to one PCI config memory region, see the last sentence in
> http://marc.info/?l=xen-devel&m=142529695117142.  Did I misunderstand
> it?  If a host bridge has a 1:1 relationship with CFG space, then each
> MCFG entry would correspond to one host bridge and one segment.

No, that sentence doesn't imply what you appear to think. Within
the same segment (and, for ACPI's sake within the same MCFG
region) you could have multiple host bridges. And then what calls
itself a host bridge (via class code) may or may not be one - often
there are many devices calling themselves such even on a single
bus (and my prior sentence specifically means to exclude those).
And finally there are systems with their PCI roots expressed only
in ACPI, without any specific PCI device serving as the host bridge.
There it is most obvious that firmware assigns both segment and
bus numbers to its liking.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-02-27 16:50                                                         ` Ian Campbell
  2015-02-27 17:15                                                           ` Stefano Stabellini
@ 2015-03-17  5:26                                                           ` Manish Jaggi
  2015-03-17  7:28                                                             ` Jan Beulich
  2015-03-17 13:17                                                             ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 70+ messages in thread
From: Manish Jaggi @ 2015-03-17  5:26 UTC (permalink / raw)
  To: Ian Campbell, Jan Beulich
  Cc: Prasun.kapoor, Stefano Stabellini, Vijaya Kumar, Julien Grall,
	xen-devel, StefanoStabellini(Stefano.Stabellini@citrix.com)


On Friday 27 February 2015 10:20 PM, Ian Campbell wrote:
> On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
>>>>> On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
>>> On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
>>>> MMCFG is a Linux config option, not to be confused with
>>>> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
>>>> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
>>>> is relevant.
>>> My (possibly flawed) understanding was that pci_mmcfg_reserved was
>>> intended to propagate the result of dom0 parsing some firmware table or
>>> other to the hypevisor.
>> That's not flawed at all.
> I think that's a first in this thread ;-)
>
>>> In Linux dom0 we call it walking pci_mmcfg_list, which looking at
>>> arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
>>> over a "struct acpi_table_mcfg" (there also appears to be a bunch of
>>> processor family derived entries, which I guess are "quirks" of some
>>> sort).
>> Right - this parses ACPI tables (plus applies some knowledge about
>> certain specific systems/chipsets/CPUs) and verifies that the space
>> needed for the MMCFG region is properly reserved either in E820 or
>> in the ACPI specified resources (only if so Linux decides to use
>> MMCFG and consequently also tells Xen that it may use it).
> Thanks.
>
> So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
> applies as is to Device Tree based ARM devices, including the need for
> the PHYSDEVOP_pci_host_bridge_add call.
>
> On ACPI based devices we will have the MCFG table, and things follow
> much as for x86:
>
>        * Xen should parse MCFG to discover the PCI host-bridges
>        * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
>          the same way as Xen/x86 does.
>
> The SBSA, an ARM standard for "servers", mandates various things which
> we can rely on here because ACPI on ARM requires an SBSA compliant
> system. So things like odd quirks in PCI controllers or magic setup are
> spec'd out of our zone of caring (into the firmware I suppose), hence
> there is nothing like the DT_DEVICE_START stuff to register specific
> drivers etc.
>
> The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
> systems (any more than it is on x86). We can decide whether to omit it
> from dom0 or ignore it from Xen later on.
>
> (Manish, this is FYI, I don't expect you to implement ACPI support!)

In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a 
hypercall to inform xen that a new pci device has been added.
If we were to inform xen about a new pci bus that is added there are  2 ways
a) Issue the hypercall from drivers/pci/probe.c
b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue 
PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that 
segment number (s_bdf), it will return an error
SEG_NO_NOT_FOUND. After that the linux xen code could issue the 
PHYSDEVOP_pci_host_bridge_add hypercall.

I think (b) can be done with minimal code changes. What do you think ?

> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-17  5:26                                                           ` Manish Jaggi
@ 2015-03-17  7:28                                                             ` Jan Beulich
  2015-03-17 12:06                                                               ` Manish Jaggi
  2015-03-17 13:17                                                             ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-03-17  7:28 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 17.03.15 at 06:26, <mjaggi@caviumnetworks.com> wrote:
> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a 
> hypercall to inform xen that a new pci device has been added.
> If we were to inform xen about a new pci bus that is added there are  2 ways
> a) Issue the hypercall from drivers/pci/probe.c
> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue 
> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that 
> segment number (s_bdf), it will return an error
> SEG_NO_NOT_FOUND. After that the linux xen code could issue the 
> PHYSDEVOP_pci_host_bridge_add hypercall.
> 
> I think (b) can be done with minimal code changes. What do you think ?

I'm pretty sure (a) would even be refused by the maintainers, unless
there already is a notification being sent. As to (b) - kernel code could
keep track of which segment/bus pairs it informed Xen about, and
hence wouldn't even need to wait for an error to be returned from
the device-add request (which in your proposal would need to be re-
issued after the host-bridge-add).

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-17  7:28                                                             ` Jan Beulich
@ 2015-03-17 12:06                                                               ` Manish Jaggi
  2015-03-17 12:31                                                                 ` Jan Beulich
  0 siblings, 1 reply; 70+ messages in thread
From: Manish Jaggi @ 2015-03-17 12:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com)


On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:
>>>> On 17.03.15 at 06:26, <mjaggi@caviumnetworks.com> wrote:
>> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
>> hypercall to inform xen that a new pci device has been added.
>> If we were to inform xen about a new pci bus that is added there are  2 ways
>> a) Issue the hypercall from drivers/pci/probe.c
>> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
>> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
>> segment number (s_bdf), it will return an error
>> SEG_NO_NOT_FOUND. After that the linux xen code could issue the
>> PHYSDEVOP_pci_host_bridge_add hypercall.
>>
>> I think (b) can be done with minimal code changes. What do you think ?
> I'm pretty sure (a) would even be refused by the maintainers, unless
> there already is a notification being sent. As to (b) - kernel code could
> keep track of which segment/bus pairs it informed Xen about, and
> hence wouldn't even need to wait for an error to be returned from
> the device-add request (which in your proposal would need to be re-
> issued after the host-bridge-add).
Have a query on the CFG space address to be passed as hypercall parameter.
The of_pci_get_host_bridge_resource only parses the ranges property and 
not reg.
reg property has the CFG space address, which is usually stored in 
private pci host controller driver structures.

so pci_dev 's parent pci_bus would not have that info.
One way is to add a method in struct pci_ops but not sure it will be 
accepted or not.
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-17 12:06                                                               ` Manish Jaggi
@ 2015-03-17 12:31                                                                 ` Jan Beulich
  2015-03-18  4:05                                                                   ` Manish Jaggi
  0 siblings, 1 reply; 70+ messages in thread
From: Jan Beulich @ 2015-03-17 12:31 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com)

>>> On 17.03.15 at 13:06, <mjaggi@caviumnetworks.com> wrote:

> On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:
>>>>> On 17.03.15 at 06:26, <mjaggi@caviumnetworks.com> wrote:
>>> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
>>> hypercall to inform xen that a new pci device has been added.
>>> If we were to inform xen about a new pci bus that is added there are  2 ways
>>> a) Issue the hypercall from drivers/pci/probe.c
>>> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
>>> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
>>> segment number (s_bdf), it will return an error
>>> SEG_NO_NOT_FOUND. After that the linux xen code could issue the
>>> PHYSDEVOP_pci_host_bridge_add hypercall.
>>>
>>> I think (b) can be done with minimal code changes. What do you think ?
>> I'm pretty sure (a) would even be refused by the maintainers, unless
>> there already is a notification being sent. As to (b) - kernel code could
>> keep track of which segment/bus pairs it informed Xen about, and
>> hence wouldn't even need to wait for an error to be returned from
>> the device-add request (which in your proposal would need to be re-
>> issued after the host-bridge-add).
> Have a query on the CFG space address to be passed as hypercall parameter.
> The of_pci_get_host_bridge_resource only parses the ranges property and 
> not reg.
> reg property has the CFG space address, which is usually stored in 
> private pci host controller driver structures.
> 
> so pci_dev 's parent pci_bus would not have that info.
> One way is to add a method in struct pci_ops but not sure it will be 
> accepted or not.

I'm afraid I don't understand what you're trying to tell me.

Jan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-17  5:26                                                           ` Manish Jaggi
  2015-03-17  7:28                                                             ` Jan Beulich
@ 2015-03-17 13:17                                                             ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-17 13:17 UTC (permalink / raw)
  To: Manish Jaggi
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com),
	Jan Beulich

On Tue, Mar 17, 2015 at 10:56:48AM +0530, Manish Jaggi wrote:
> 
> On Friday 27 February 2015 10:20 PM, Ian Campbell wrote:
> >On Fri, 2015-02-27 at 16:35 +0000, Jan Beulich wrote:
> >>>>>On 27.02.15 at 16:24, <ian.campbell@citrix.com> wrote:
> >>>On Fri, 2015-02-27 at 14:54 +0000, Stefano Stabellini wrote:
> >>>>MMCFG is a Linux config option, not to be confused with
> >>>>PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> >>>>think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> >>>>is relevant.
> >>>My (possibly flawed) understanding was that pci_mmcfg_reserved was
> >>>intended to propagate the result of dom0 parsing some firmware table or
> >>>other to the hypevisor.
> >>That's not flawed at all.
> >I think that's a first in this thread ;-)
> >
> >>>In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> >>>arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> >>>over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> >>>processor family derived entries, which I guess are "quirks" of some
> >>>sort).
> >>Right - this parses ACPI tables (plus applies some knowledge about
> >>certain specific systems/chipsets/CPUs) and verifies that the space
> >>needed for the MMCFG region is properly reserved either in E820 or
> >>in the ACPI specified resources (only if so Linux decides to use
> >>MMCFG and consequently also tells Xen that it may use it).
> >Thanks.
> >
> >So I think what I wrote in <1424948710.14641.25.camel@citrix.com>
> >applies as is to Device Tree based ARM devices, including the need for
> >the PHYSDEVOP_pci_host_bridge_add call.
> >
> >On ACPI based devices we will have the MCFG table, and things follow
> >much as for x86:
> >
> >       * Xen should parse MCFG to discover the PCI host-bridges
> >       * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
> >         the same way as Xen/x86 does.
> >
> >The SBSA, an ARM standard for "servers", mandates various things which
> >we can rely on here because ACPI on ARM requires an SBSA compliant
> >system. So things like odd quirks in PCI controllers or magic setup are
> >spec'd out of our zone of caring (into the firmware I suppose), hence
> >there is nothing like the DT_DEVICE_START stuff to register specific
> >drivers etc.
> >
> >The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
> >systems (any more than it is on x86). We can decide whether to omit it
> >from dom0 or ignore it from Xen later on.
> >
> >(Manish, this is FYI, I don't expect you to implement ACPI support!)
> 
> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
> hypercall to inform xen that a new pci device has been added.
> If we were to inform xen about a new pci bus that is added there are  2 ways
> a) Issue the hypercall from drivers/pci/probe.c
> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
> segment number (s_bdf), it will return an error
> SEG_NO_NOT_FOUND. After that the linux xen code could issue the
> PHYSDEVOP_pci_host_bridge_add hypercall.

Couldn't the code figure out from 'struct pci_dev' whether the device
is a bridge or an PCI device? And then do the proper hypercall?

Interesting thing you _might_ hit (that I did) was that if you use
'bus=reassign' which re-assigns the bus numbers during scan - Xen
gets very very confused. As in, the bus devices that Xen sees vs the
ones Linux sees are different.

Whether you will encounter this depends on whether the bridge
devices and pci devices end up having an differnet bus number
from what Xen scanned, and from what Linux has determined.

(As in, Linux has found a bridge device with more PCI devices -so
it repograms the bridge which moves all of the other PCI devices 
"below" it by X number).

The reason I am bringing it up - it sounds like Xen will have no clue
about some devices - and be told about it by Linux  - if some reason
it has the same bus number as some that Xen already scanned - gah!
> 
> I think (b) can be done with minimal code changes. What do you think ?

Less code == better.
> 
> >Ian.
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: RFC: [PATCH 1/3] Enhance platform support for PCI
  2015-03-17 12:31                                                                 ` Jan Beulich
@ 2015-03-18  4:05                                                                   ` Manish Jaggi
  0 siblings, 0 replies; 70+ messages in thread
From: Manish Jaggi @ 2015-03-18  4:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Prasun.kapoor, Ian Campbell, Stefano Stabellini, Vijaya Kumar,
	Julien Grall, xen-devel,
	StefanoStabellini(Stefano.Stabellini@citrix.com)


On Tuesday 17 March 2015 06:01 PM, Jan Beulich wrote:
>>>> On 17.03.15 at 13:06, <mjaggi@caviumnetworks.com> wrote:
>> On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:
>>>>>> On 17.03.15 at 06:26, <mjaggi@caviumnetworks.com> wrote:
>>>> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
>>>> hypercall to inform xen that a new pci device has been added.
>>>> If we were to inform xen about a new pci bus that is added there are  2 ways
>>>> a) Issue the hypercall from drivers/pci/probe.c
>>>> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
>>>> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
>>>> segment number (s_bdf), it will return an error
>>>> SEG_NO_NOT_FOUND. After that the linux xen code could issue the
>>>> PHYSDEVOP_pci_host_bridge_add hypercall.
>>>>
>>>> I think (b) can be done with minimal code changes. What do you think ?
>>> I'm pretty sure (a) would even be refused by the maintainers, unless
>>> there already is a notification being sent. As to (b) - kernel code could
>>> keep track of which segment/bus pairs it informed Xen about, and
>>> hence wouldn't even need to wait for an error to be returned from
>>> the device-add request (which in your proposal would need to be re-
>>> issued after the host-bridge-add).
>> Have a query on the CFG space address to be passed as hypercall parameter.
>> The of_pci_get_host_bridge_resource only parses the ranges property and
>> not reg.
>> reg property has the CFG space address, which is usually stored in
>> private pci host controller driver structures.
>>
>> so pci_dev 's parent pci_bus would not have that info.
>> One way is to add a method in struct pci_ops but not sure it will be
>> accepted or not.
> I'm afraid I don't understand what you're trying to tell me.
Hi Jan,
I missed this during initial discussion and found out while coding that 
CFG Space address  of a pci host is stored in the reg property
and the of_pci code dos not store reg in the resources only ranges are 
stored. So the pci_bus which is the rootbus created in the probe 
function of the
pcie controller driver will have ranges values in resources but reg 
property value (CFG space address) in the private data.
So from drivers/xen/pci.c we can find out the root bus (pci_bus) from 
the pci_dev (via BUS_NOTIFY) but cannot get the CFG space address.

Now there are 2 ways
a) Add a pci_ops to return the CFG space address
b) Let the pci host controller driver invoke a function 
xen_invoke_hypercall () providing bus number and cfg_space address.
     xen_invoke_hypercall would be implemented in drivers/xen/pci.c and 
would issue  PHYSDEVOP_pci_host_bridge_add hypercall
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2015-03-18  4:05 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-20 11:34 RFC: [PATCH 1/3] Enhance platform support for PCI Manish Jaggi
2015-02-20 12:03 ` Julien Grall
2015-02-20 12:10   ` Manish Jaggi
2015-02-20 12:20     ` Julien Grall
2015-02-20 12:34       ` Manish Jaggi
2015-02-20 13:01         ` Manish Jaggi
2015-02-20 13:45           ` Ian Campbell
2015-02-20 14:11             ` Jan Beulich
2015-02-20 14:26               ` Ian Campbell
2015-02-20 14:39                 ` Jan Beulich
2015-02-20 15:01                   ` Ian Campbell
2015-02-20 15:13                     ` Manish Jaggi
2015-02-20 15:15                       ` Julien Grall
2015-02-20 15:15                     ` Jan Beulich
2015-02-20 17:33                       ` Ian Campbell
2015-02-23  8:43                         ` Jan Beulich
2015-02-23 12:45                           ` Ian Campbell
2015-02-23 14:07                             ` Jan Beulich
2015-02-23 14:33                               ` Ian Campbell
2015-02-23 14:45                                 ` Jan Beulich
2015-02-23 15:02                                   ` Ian Campbell
2015-02-23 15:27                                     ` Jan Beulich
2015-02-23 15:46                                       ` Ian Campbell
2015-02-23 16:20                                         ` Jan Beulich
2015-02-26 10:09                                           ` Manish Jaggi
2015-02-26 10:30                                             ` Jan Beulich
2015-02-26 11:05                                             ` Ian Campbell
2015-02-27 14:33                                               ` Stefano Stabellini
2015-02-27 14:42                                                 ` Ian Campbell
2015-02-27 14:54                                                   ` Stefano Stabellini
2015-02-27 15:24                                                     ` Ian Campbell
2015-02-27 15:29                                                       ` Ian Campbell
2015-02-27 16:35                                                       ` Jan Beulich
2015-02-27 16:50                                                         ` Ian Campbell
2015-02-27 17:15                                                           ` Stefano Stabellini
2015-03-02 11:48                                                             ` Ian Campbell
2015-03-03  9:19                                                               ` Manish Jaggi
2015-03-17  5:26                                                           ` Manish Jaggi
2015-03-17  7:28                                                             ` Jan Beulich
2015-03-17 12:06                                                               ` Manish Jaggi
2015-03-17 12:31                                                                 ` Jan Beulich
2015-03-18  4:05                                                                   ` Manish Jaggi
2015-03-17 13:17                                                             ` Konrad Rzeszutek Wilk
2015-03-11 18:26                           ` Stefano Stabellini
2015-03-12  9:16                             ` Jan Beulich
2015-03-12 10:33                               ` Stefano Stabellini
2015-03-12 11:28                                 ` Jan Beulich
2015-03-12  9:30                             ` Ian Campbell
2015-02-20 14:14             ` Manish Jaggi
2015-02-20 14:39               ` Ian Campbell
2015-02-23 10:59                 ` Manish Jaggi
2015-02-23 11:14                   ` Julien Grall
2015-02-23 11:50                     ` Manish Jaggi
2015-02-23 15:15                       ` Julien Grall
2015-02-23 17:12                         ` Manish Jaggi
2015-02-23 21:39                           ` Julien Grall
2015-02-24  0:23                             ` Manish Jaggi
2015-02-24 13:43                               ` Julien Grall
2015-02-25  2:33                                 ` Manish Jaggi
2015-02-25 10:20                                   ` Ian Campbell
2015-02-26 10:49                                     ` Vijay Kilari
2015-02-26 11:12                                       ` Ian Campbell
2015-02-26 13:58                                         ` Julien Grall
2015-02-26 14:46                                       ` Pranavkumar Sawargaonkar
2015-02-26 15:17                                         ` Julien Grall
2015-02-27 10:11                                           ` Pranavkumar Sawargaonkar
2015-02-27 10:38                                             ` Ian Campbell
2015-02-27 13:22                                               ` Ian Campbell
2015-02-27 13:59                                                 ` Vijay Kilari
2015-02-20 13:37       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.