linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
@ 2019-06-13  7:54 Benjamin Herrenschmidt
  2019-06-13 19:02 ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-13  7:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

The current arm64 PCI code for ACPI platforms will unconditionally
reassign all resources.

This is not only suboptimal, it's also wrong for a number of cases, for
example, this could invalidate a UEFI framebuffer address, or a runtime
firmware could be using some of the devices in their original location.

There is an ACPI method defined today for P2P bridges (_DSM #5) that
can indicate that a bridge resources set by firmware. There is current
discussions to extend that method to cover host bridges, and define
a value of "0" as meaning that the resources must be preserved.

This patch adds the resource assignment policy to struct
pci_host_bridge and sets it based on the presence of that method and if
present the value returned, and honors it on arm64.

No other architectures are currently affected, and the default is kept
to "reassign everything" on arm64 for now via an #ifdef, though we do
plan to get rid of that in a separate patch.

The setting in pci_host_bridge "looks" generic because I intend in
subsquent work to consolidate the resource allocation policy accross
architectures and I intend for that setting to be the canonical
location used by the generic code to decide what to do.

This is based on some earlier work by
Ard Biesheuvel <ard.biesheuvel@linaro.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

 arch/arm64/kernel/pci.c  | 12 ++++++++++--
 drivers/acpi/pci_root.c  | 42 ++++++++++++++++++++++++++++++++++++++++
 include/linux/pci-acpi.h |  7 ++++---
 include/linux/pci.h      | 10 ++++++++++
 4 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index bb85e2f4603f..b209a506f390 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	struct acpi_pci_generic_root_info *ri;
 	struct pci_bus *bus, *child;
 	struct acpi_pci_root_ops *root_ops;
+	struct pci_host_bridge *hb;
 
 	ri = kzalloc(sizeof(*ri), GFP_KERNEL);
 	if (!ri)
@@ -193,8 +194,15 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	if (!bus)
 		return NULL;
 
-	pci_bus_size_bridges(bus);
-	pci_bus_assign_resources(bus);
+	hb = pci_find_host_bridge(bus);
+
+	/* If the policy is normal or probe only, claim existing resources */
+	if (hb->rsrc_policy != pci_rsrc_reassign)
+		pci_bus_claim_resources(bus);
+
+	/* If the policy is not probe only, assign what's left unassigned */
+	if (hb->rsrc_policy != pci_rsrc_probe_only)
+		pci_assign_unassigned_root_bus_resources(bus);
 
 	list_for_each_entry(child, &bus->children, node)
 		pcie_bus_configure_settings(child);
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 39f5d172e84f..410f7f2b2587 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -881,6 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
 	int node = acpi_get_node(device->handle);
 	struct pci_bus *bus;
 	struct pci_host_bridge *host_bridge;
+	union acpi_object *obj;
 
 	info->root = root;
 	info->bridge = device;
@@ -917,6 +918,47 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
 	if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
 		host_bridge->native_ltr = 0;
 
+	/*
+	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
+	 * Configuration', on the host bridge. This tells us whether the
+	 * firmware wants us to preserve or reassign the configuration of
+	 * the PCI resource tree for this root bridge.
+	 *
+	 * There are three possible outcomes here:
+	 *
+	 *  - _DSM #5 is absent. This is the default. Currently it will be
+	 *    architecture specific in order to maintain existing behaviours
+	 *    but the plan is to move arm64 into the fold: x86 and ia64 will
+	 *    claim the existing config, and reassign if needed. arm64 will
+	 *    always reassign.
+	 *
+	 *  - _DSM #5 exists and is 1. This is the FW telling us to ignore
+	 *    the configuration it performed. This is currently only supported
+	 *    on arm64.
+	 *
+	 *  - _DSM #5 exists and is 0. This should be the same as the default
+	 *    (_DSM #5 absent). However there are some assumptions flying around
+	 *    that this means we must keep the FW configuration intact. So we
+	 *    treat that as "probe only" for the time being. This is currently
+	 *    only supported on arm64.
+	 */
+	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
+	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
+	if (obj && obj->type == ACPI_TYPE_INTEGER) {
+		if (obj->integer.value == 1)
+			host_bridge->rsrc_policy = pci_rsrc_reassign;
+		else
+			host_bridge->rsrc_policy = pci_rsrc_probe_only;
+	} else {
+		/* Default is arch specific ... for now */
+#ifdef CONFIG_ARM64
+		host_bridge->rsrc_policy = pci_rsrc_reassign;
+#else
+		host_bridge->rsrc_policy = pci_rsrc_normal;
+#endif
+	}
+	ACPI_FREE(obj);
+
 	pci_scan_child_bus(bus);
 	pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
 				    info);
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 8082b612f561..62b7fdcc661c 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
 #endif
 
 extern const guid_t pci_acpi_dsm_guid;
-#define DEVICE_LABEL_DSM	0x07
-#define RESET_DELAY_DSM		0x08
-#define FUNCTION_DELAY_DSM	0x09
+#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
+#define DEVICE_LABEL_DSM		0x07
+#define RESET_DELAY_DSM			0x08
+#define FUNCTION_DELAY_DSM		0x09
 
 #else	/* CONFIG_ACPI */
 static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index dd436da7eccc..7ff5cedb30cf 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -486,6 +486,12 @@ static inline int pci_channel_offline(struct pci_dev *pdev)
 	return (pdev->error_state != pci_channel_io_normal);
 }
 
+enum pci_host_rsrc_policy {
+	pci_rsrc_normal,	/* Probe and (re)assign what's missing/broken */
+	pci_rsrc_probe_only,	/* Probe only */
+	pci_rsrc_reassign,	/* Reassign resources */
+};
+
 struct pci_host_bridge {
 	struct device	dev;
 	struct pci_bus	*bus;		/* Root bus */
@@ -506,6 +512,10 @@ struct pci_host_bridge {
 	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
 	unsigned int	native_pme:1;		/* OS may use PCIe PME */
 	unsigned int	native_ltr:1;		/* OS may use PCIe LTR */
+
+	/* Resource assignment/allocation policy */
+	enum pci_host_rsrc_policy rsrc_policy;
+
 	/* Resource alignment requirements */
 	resource_size_t (*align_resource)(struct pci_dev *dev,
 			const struct resource *res,



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-13  7:54 [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
@ 2019-06-13 19:02 ` Bjorn Helgaas
  2019-06-13 21:59   ` Benjamin Herrenschmidt
  2019-06-13 23:07   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2019-06-13 19:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Thu, Jun 13, 2019 at 05:54:56PM +1000, Benjamin Herrenschmidt wrote:
> The current arm64 PCI code for ACPI platforms will unconditionally
> reassign all resources.
> 
> This is not only suboptimal, it's also wrong for a number of cases, for
> example, this could invalidate a UEFI framebuffer address, or a runtime
> firmware could be using some of the devices in their original location.
> 
> There is an ACPI method defined today for P2P bridges (_DSM #5) that
> can indicate that a bridge resources set by firmware. There is current
> discussions to extend that method to cover host bridges, and define
> a value of "0" as meaning that the resources must be preserved.

The ECR does extend the r3.2 spec so the _DSM can apply to host
bridges.  Apart from that change, the ECR clarifies the language
without changing the sense.  The meaning of "0" doesn't change.

> This patch adds the resource assignment policy to struct
> pci_host_bridge and sets it based on the presence of that method and if
> present the value returned, and honors it on arm64.
> 
> No other architectures are currently affected, and the default is kept
> to "reassign everything" on arm64 for now via an #ifdef, though we do
> plan to get rid of that in a separate patch.
> 
> The setting in pci_host_bridge "looks" generic because I intend in
> subsquent work to consolidate the resource allocation policy accross
> architectures and I intend for that setting to be the canonical
> location used by the generic code to decide what to do.
> 
> This is based on some earlier work by
> Ard Biesheuvel <ard.biesheuvel@linaro.org>
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
>  arch/arm64/kernel/pci.c  | 12 ++++++++++--
>  drivers/acpi/pci_root.c  | 42 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci-acpi.h |  7 ++++---
>  include/linux/pci.h      | 10 ++++++++++
>  4 files changed, 66 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
> index bb85e2f4603f..b209a506f390 100644
> --- a/arch/arm64/kernel/pci.c
> +++ b/arch/arm64/kernel/pci.c
> @@ -168,6 +168,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	struct acpi_pci_generic_root_info *ri;
>  	struct pci_bus *bus, *child;
>  	struct acpi_pci_root_ops *root_ops;
> +	struct pci_host_bridge *hb;

The only other use of "struct pci_host_bridge *" in this file uses
"bridge" as the variable, so I'd follow suit.

>  	ri = kzalloc(sizeof(*ri), GFP_KERNEL);
>  	if (!ri)
> @@ -193,8 +194,15 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
>  	if (!bus)
>  		return NULL;
>  
> -	pci_bus_size_bridges(bus);
> -	pci_bus_assign_resources(bus);
> +	hb = pci_find_host_bridge(bus);
> +
> +	/* If the policy is normal or probe only, claim existing resources */
> +	if (hb->rsrc_policy != pci_rsrc_reassign)
> +		pci_bus_claim_resources(bus);
> +
> +	/* If the policy is not probe only, assign what's left unassigned */
> +	if (hb->rsrc_policy != pci_rsrc_probe_only)
> +		pci_assign_unassigned_root_bus_resources(bus);
>  
>  	list_for_each_entry(child, &bus->children, node)
>  		pcie_bus_configure_settings(child);
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 39f5d172e84f..410f7f2b2587 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -881,6 +881,7 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
>  	int node = acpi_get_node(device->handle);
>  	struct pci_bus *bus;
>  	struct pci_host_bridge *host_bridge;
> +	union acpi_object *obj;
>  
>  	info->root = root;
>  	info->bridge = device;
> @@ -917,6 +918,47 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
>  	if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
>  		host_bridge->native_ltr = 0;
>  
> +	/*
> +	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> +	 * Configuration', on the host bridge. This tells us whether the
> +	 * firmware wants us to preserve or reassign the configuration of
> +	 * the PCI resource tree for this root bridge.
> +	 *
> +	 * There are three possible outcomes here:
> +	 *
> +	 *  - _DSM #5 is absent. This is the default. Currently it will be
> +	 *    architecture specific in order to maintain existing behaviours
> +	 *    but the plan is to move arm64 into the fold: x86 and ia64 will
> +	 *    claim the existing config, and reassign if needed. arm64 will
> +	 *    always reassign.

The spec (PCI FW r3.2) says a _DSM that returns 0 means "OS must not
ignore config done by firmware".  The ECR in the works changes the
wording to something like "OS must preserve config done by firmware",
which is equivalent but clearer.

The r3.2 spec goes on to suggest that a missing _DSM means the same
thing ("OS must not ignore firmware config").  I think that part is
crap, and the ECR removes that wording.

I don't accept the _DSM #5 section in the PCI FW spec as being
normative about what a missing _DSM #5 means.  This section didn't
even exist until r3.2, and all it says is "this situation is the same
as the legacy situation where this _DSM is not provided".  That's just
hand-waving; it's not a requirement.

I'm not aware of any spec that says the OS can't change PCI resources
(if there is such a spec, please cite it).

So my opinion is that a missing _DSM means nothing, and the default
situation is that the OS can change PCI resources as necessary.

The ECR input from Windows was that in the absence of a _DSM #5, they
keep the boot configuration unless a FW bug causes an overlap, a
hot-add requires rebalancing, or the system includes external (e.g.,
Thunderbolt) devices.  That's what I think Linux should do, too: keep
the config from firmware unless we have a reason to change it.

> +	 *  - _DSM #5 exists and is 1. This is the FW telling us to ignore
> +	 *    the configuration it performed. This is currently only supported
> +	 *    on arm64.

The r3.2 spec actually says "the OS *may* ignore config done by
firmware".  There's no *requirement* that the OS change anything.

IMHO *this* is the same as the case where there's no _DSM at all.

> +	 *  - _DSM #5 exists and is 0. This should be the same as the default
> +	 *    (_DSM #5 absent). However there are some assumptions flying around
> +	 *    that this means we must keep the FW configuration intact. So we
> +	 *    treat that as "probe only" for the time being. This is currently
> +	 *    only supported on arm64.

PCI FW r3.2 says 0 means "the OS must not ignore config done by
firmware."  That means we must keep the FW configuration intact.

> +	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> +	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> +	if (obj && obj->type == ACPI_TYPE_INTEGER) {
> +		if (obj->integer.value == 1)
> +			host_bridge->rsrc_policy = pci_rsrc_reassign;
> +		else
> +			host_bridge->rsrc_policy = pci_rsrc_probe_only;
> +	} else {
> +		/* Default is arch specific ... for now */
> +#ifdef CONFIG_ARM64
> +		host_bridge->rsrc_policy = pci_rsrc_reassign;
> +#else
> +		host_bridge->rsrc_policy = pci_rsrc_normal;
> +#endif
> +	}

I think this needs to be a single bit, not a 3-choice thing.  I don't
think it's possible to clearly explain how pci_rsrc_normal is
different from pci_rsrc_reassign.  We either need to preserve the
config or we don't.

A middle ground of "we don't need to preserve the config and in fact
we *must* reassign resources" is pointless because we don't know *why*
we have to reassign things, and we don't know what sort of change
would be correct.

> +	ACPI_FREE(obj);
> +
>  	pci_scan_child_bus(bus);
>  	pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
>  				    info);
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index 8082b612f561..62b7fdcc661c 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
>  #endif
>  
>  extern const guid_t pci_acpi_dsm_guid;
> -#define DEVICE_LABEL_DSM	0x07
> -#define RESET_DELAY_DSM		0x08
> -#define FUNCTION_DELAY_DSM	0x09
> +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> +#define DEVICE_LABEL_DSM		0x07
> +#define RESET_DELAY_DSM			0x08
> +#define FUNCTION_DELAY_DSM		0x09
>  
>  #else	/* CONFIG_ACPI */
>  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index dd436da7eccc..7ff5cedb30cf 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -486,6 +486,12 @@ static inline int pci_channel_offline(struct pci_dev *pdev)
>  	return (pdev->error_state != pci_channel_io_normal);
>  }
>  
> +enum pci_host_rsrc_policy {
> +	pci_rsrc_normal,	/* Probe and (re)assign what's missing/broken */
> +	pci_rsrc_probe_only,	/* Probe only */
> +	pci_rsrc_reassign,	/* Reassign resources */
> +};
> +
>  struct pci_host_bridge {
>  	struct device	dev;
>  	struct pci_bus	*bus;		/* Root bus */
> @@ -506,6 +512,10 @@ struct pci_host_bridge {
>  	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
>  	unsigned int	native_pme:1;		/* OS may use PCIe PME */
>  	unsigned int	native_ltr:1;		/* OS may use PCIe LTR */
> +
> +	/* Resource assignment/allocation policy */
> +	enum pci_host_rsrc_policy rsrc_policy;
> +
>  	/* Resource alignment requirements */
>  	resource_size_t (*align_resource)(struct pci_dev *dev,
>  			const struct resource *res,
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-13 19:02 ` Bjorn Helgaas
@ 2019-06-13 21:59   ` Benjamin Herrenschmidt
  2019-06-13 23:07   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-13 21:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Thu, 2019-06-13 at 14:02 -0500, Bjorn Helgaas wrote:
> > +	/*
> > +	 * Invoke the PCI device specific method (_DSM) #5 'Ignore PCI Boot
> > +	 * Configuration', on the host bridge. This tells us whether the
> > +	 * firmware wants us to preserve or reassign the configuration of
> > +	 * the PCI resource tree for this root bridge.
> > +	 *
> > +	 * There are three possible outcomes here:
> > +	 *
> > +	 *  - _DSM #5 is absent. This is the default. Currently it will be
> > +	 *    architecture specific in order to maintain existing behaviours
> > +	 *    but the plan is to move arm64 into the fold: x86 and ia64 will
> > +	 *    claim the existing config, and reassign if needed. arm64 will
> > +	 *    always reassign.
> 
> The spec (PCI FW r3.2) says a _DSM that returns 0 means "OS must not
> ignore config done by firmware".  The ECR in the works changes the
> wording to something like "OS must preserve config done by firmware",
> which is equivalent but clearer.
>
> The r3.2 spec goes on to suggest that a missing _DSM means the same
> thing ("OS must not ignore firmware config").  I think that part is
> crap, and the ECR removes that wording.
> 
> I don't accept the _DSM #5 section in the PCI FW spec as being
> normative about what a missing _DSM #5 means.  This section didn't
> even exist until r3.2, and all it says is "this situation is the same
> as the legacy situation where this _DSM is not provided".  That's just
> hand-waving; it's not a requirement.
> 
> I'm not aware of any spec that says the OS can't change PCI resources
> (if there is such a spec, please cite it).
> 
> So my opinion is that a missing _DSM means nothing, and the default
> situation is that the OS can change PCI resources as necessary.

Sure, but what do you think of my wording in the patch ? This is pretty
much what I've done: Keep the current (default) behaviour, which on
x86/ia64 is to claim what's there if it looks sensible and change thing
as deemed fit (reassigned what's not sensible, extent HP bridge
resources etc...).

This patch doesn't change the arm64 behaviour in that case which is to
currently reallocate everything but Lorenzo seems to agree that this
isn't great and we should change towards a more x86-like default, which
I'm working on, but I want to keep that change separate from this
patch, if anything for bisectability of potential regressions.

> The ECR input from Windows was that in the absence of a _DSM #5, they
> keep the boot configuration unless a FW bug causes an overlap, a
> hot-add requires rebalancing, or the system includes external (e.g.,
> Thunderbolt) devices.  That's what I think Linux should do, too: keep
> the config from firmware unless we have a reason to change it.

Right, and this is what x86 does, and this is what we'll eventually do
on arm64 as above.

> > +	 *  - _DSM #5 exists and is 1. This is the FW telling us to ignore
> > +	 *    the configuration it performed. This is currently only supported
> > +	 *    on arm64.
> 
> The r3.2 spec actually says "the OS *may* ignore config done by
> firmware".  There's no *requirement* that the OS change anything.
> 
> IMHO *this* is the same as the case where there's no _DSM at all.

May or may not be. Seattle platforms here tells us explicitely that we
should ignore the configuration, unless I'm mistaken. Lorenzo do you
know better here ?

> > +	 *  - _DSM #5 exists and is 0. This should be the same as the default
> > +	 *    (_DSM #5 absent). However there are some assumptions flying around
> > +	 *    that this means we must keep the FW configuration intact. So we
> > +	 *    treat that as "probe only" for the time being. This is currently
> > +	 *    only supported on arm64.
> 
> PCI FW r3.2 says 0 means "the OS must not ignore config done by
> firmware."  That means we must keep the FW configuration intact.

So "must not ignore" doesn't mean "keep intact" in my book. In fact, to
be honest, the wording in the spec pretty much means the same for all 3
cases, ie, you "may ignore" or "must not ignore" is bullshit. I can
perfectly "not ignore but decide it's bad and change it anyway". This
is all a terrible wording.

Like all firmware specs, it's designed for people to misinterpret it
and get things wrong :-)

What we know is that there is one platform that implements _DSM #5
today and returns 1 with the intention, unless I misread what I was
told, to tell us to explicitly ignore what FW did.

> > +	obj = acpi_evaluate_dsm(ACPI_HANDLE(bus->bridge), &pci_acpi_dsm_guid, 1,
> > +	                        IGNORE_PCI_BOOT_CONFIG_DSM, NULL);
> > +	if (obj && obj->type == ACPI_TYPE_INTEGER) {
> > +		if (obj->integer.value == 1)
> > +			host_bridge->rsrc_policy = pci_rsrc_reassign;
> > +		else
> > +			host_bridge->rsrc_policy = pci_rsrc_probe_only;
> > +	} else {
> > +		/* Default is arch specific ... for now */
> > +#ifdef CONFIG_ARM64
> > +		host_bridge->rsrc_policy = pci_rsrc_reassign;
> > +#else
> > +		host_bridge->rsrc_policy = pci_rsrc_normal;
> > +#endif
> > +	}
> 
> I think this needs to be a single bit, not a 3-choice thing. 

Well, no :-)

The enum in host_bridge needs to be 3 choices because it's meant to
represent the 3 policies that Linux has today sprinkled accross
architectures in rather inconsistent ways, but I'm not going to break
that in one patch and there are reasons for all 3 to exist. We can
discuss that separately if you want.

I plan to start cleaning up code to use that field to represent the
policies properly accross the board as I also consolidate code accross
archs/platforms.

However, what can possibly be debated here is how _DSM #5 maps to those
3. You think it should map to only two of them, I tend to disagree with
your arguments above but short of sitting on the spec commitee and
clarifying the wording, it's a matter of how has the pointiest hat and
the rounder crystal ball here.

>  I don't
> think it's possible to clearly explain how pci_rsrc_normal is
> different from pci_rsrc_reassign.  We either need to preserve the
> config or we don't.

No. "reassign" is "must reassign" or "must ignore FW settings". It may
not be what we want to do with ACPI _DSM #5 = 1, but it's definitely
what all current ARM32, powerpc4xx, and in general most embedded
platforms want today. In fact we have more platforms in the tree who
want just that.

Now you may want to make the argument that those platforms shouldn't
want that, but changing it would probably break a good half of them in
one go, that's what they've always done and I'm not here to mess around
with this.

Once I've consolidated more code, it might become easier on a per
platform basis to chose to switch to the "use what's there and reassign
what's broken or missing" model (ie, "pci_rsrc_normal"), but that will
have to be done on a case by case basis with appropriate testing etc..

> A middle ground of "we don't need to preserve the config and in fact
> we *must* reassign resources" is pointless because we don't know *why*
> we have to reassign things, and we don't know what sort of change
> would be correct.

For the ACPI case, maybe ... for all the other cases this is pretty
moot. Reassign all is what they do today, and changing it will probably
break them.

Cheers,
Ben.


> > +	ACPI_FREE(obj);
> > +
> >  	pci_scan_child_bus(bus);
> >  	pci_set_host_bridge_release(host_bridge, acpi_pci_root_release_info,
> >  				    info);
> > diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> > index 8082b612f561..62b7fdcc661c 100644
> > --- a/include/linux/pci-acpi.h
> > +++ b/include/linux/pci-acpi.h
> > @@ -107,9 +107,10 @@ static inline void acpiphp_check_host_bridge(struct acpi_device *adev) { }
> >  #endif
> >  
> >  extern const guid_t pci_acpi_dsm_guid;
> > -#define DEVICE_LABEL_DSM	0x07
> > -#define RESET_DELAY_DSM		0x08
> > -#define FUNCTION_DELAY_DSM	0x09
> > +#define IGNORE_PCI_BOOT_CONFIG_DSM	0x05
> > +#define DEVICE_LABEL_DSM		0x07
> > +#define RESET_DELAY_DSM			0x08
> > +#define FUNCTION_DELAY_DSM		0x09
> >  
> >  #else	/* CONFIG_ACPI */
> >  static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index dd436da7eccc..7ff5cedb30cf 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -486,6 +486,12 @@ static inline int pci_channel_offline(struct pci_dev *pdev)
> >  	return (pdev->error_state != pci_channel_io_normal);
> >  }
> >  
> > +enum pci_host_rsrc_policy {
> > +	pci_rsrc_normal,	/* Probe and (re)assign what's missing/broken */
> > +	pci_rsrc_probe_only,	/* Probe only */
> > +	pci_rsrc_reassign,	/* Reassign resources */
> > +};
> > +
> >  struct pci_host_bridge {
> >  	struct device	dev;
> >  	struct pci_bus	*bus;		/* Root bus */
> > @@ -506,6 +512,10 @@ struct pci_host_bridge {
> >  	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
> >  	unsigned int	native_pme:1;		/* OS may use PCIe PME */
> >  	unsigned int	native_ltr:1;		/* OS may use PCIe LTR */
> > +
> > +	/* Resource assignment/allocation policy */
> > +	enum pci_host_rsrc_policy rsrc_policy;
> > +
> >  	/* Resource alignment requirements */
> >  	resource_size_t (*align_resource)(struct pci_dev *dev,
> >  			const struct resource *res,
> > 
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-13 19:02 ` Bjorn Helgaas
  2019-06-13 21:59   ` Benjamin Herrenschmidt
@ 2019-06-13 23:07   ` Benjamin Herrenschmidt
  2019-06-14  7:42     ` Ard Biesheuvel
  1 sibling, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-13 23:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Thu, 2019-06-13 at 14:02 -0500, Bjorn Helgaas wrote:
> 
> PCI FW r3.2 says 0 means "the OS must not ignore config done by
> firmware."  That means we must keep the FW configuration intact.

So I tried implementing what you seem to want and it doesn't make sense
imho.

I think the wording of the spec is terrible and doesn't convey the
intent here.

What is it that _DSM #5 is about ? Is it about telling us that the FW
config shall not be trusted ? That seem to be its original intent based
on the existing wording and the fact that "1" says "may ignore".

Or was it always intended to be some kind of inverted logic with 0
meaning that we *must* preserve what FW did ?

But preserving what FW did was always the default for x86 and
Windows... It's just that we happen to do something wrong today on
Linux/ARM64 which is to always reassign everything.

The way Linux resource assignment works accross platforms has generally
been based on one of these 3 methods:

 - The standard x86 method, which is to claim what's there when it
doesn't look completely busted and has been assigned, then assign
what's left. This allows for FW doing partial assignment, and allows to
work around a number of BIOS bugs.

 - The "probe only" method. This was created independently on powerpc
and some other archs afaik. At least for powerpc, the reason for that
is some interesting virtualization cases where we just cannot touch or
change or move anything. The effect is to not reassign even what we
dont like, and not call pci_assign_unassign_resources().

 - The "reassign everything" method. This is used by almost all
embedded patforms accross archs. All arm32, all arm64 today (but we
agree that's wrong), all embedded powerpc etc... This is basically
meant for us not trusting whatever random uboot or other embedded FW,
if any, and do a full from-scratch assignment. There are issues in how
that is implemented accross the various platforms/archs, some for
example still honor existing bus numbers and some don't, but I doubt
it's intentional etc... but that method is there to stay.

Now, the questions are two fold

  - How do we map _DSM #5 to these, at least on arm64, maybe in the
long run we can also look at affecting x86, but that's less urgent.

  - How do I ensure the above fixes my Amazon platform ? :-)

There's one obvious thing here. If we don't want to break existing
things, then the absence of _DSM #5 must not change our existing
behaviour. I think we can all agree on this.

We might explore changing arm64 default behaviour as a second step
since we all agree it's not doing what it should, but we also know that
it will probably break some things so we need to be careful, understand
the issues and work around them. This isn't the scope of the initial
_DSM #5 patch.

That leaves us with the _DSM #5 present cases.

Now we have two values. What do they mean ? As I already said before,
the wording with "must not ignore" and "may ignore" is completely
useless and doesn't tell us a thing about the intention here. We don't
know why the FW folks may have put a given value here, and what they
expect us to do about it.

What we do know is that Seattle returns 1 and needs reassignment, and
we do know that the Amazon platforms return 0 and will want us to not
touch the existing setup.

However, does 1 means "business as usual" or does it mean "reassign
everything" ?

Does 0 means "probe only" ? Or do we still do an assignment pass for
things that the FW may have left unassigned ?

Today in Linux, the "probe only" logic tends to not call
pci_assign_unassigned_resources for example.

From a pure reading of the spec, there's an argument to be made that
both 0 and 1 values can lead to the same code that reads what's there
and reassign what's missing.

So this is a mess, a usual when it comes to specs written by
committees, but at this stage I'm at a loss as to what you want me to
do.

Cheers,
Ben.
 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-13 23:07   ` Benjamin Herrenschmidt
@ 2019-06-14  7:42     ` Ard Biesheuvel
  2019-06-14  8:36       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Ard Biesheuvel @ 2019-06-14  7:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bjorn Helgaas, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Fri, 14 Jun 2019 at 01:07, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Thu, 2019-06-13 at 14:02 -0500, Bjorn Helgaas wrote:
> >
> > PCI FW r3.2 says 0 means "the OS must not ignore config done by
> > firmware."  That means we must keep the FW configuration intact.
>
> So I tried implementing what you seem to want and it doesn't make sense
> imho.
>
> I think the wording of the spec is terrible and doesn't convey the
> intent here.
>
> What is it that _DSM #5 is about ? Is it about telling us that the FW
> config shall not be trusted ? That seem to be its original intent based
> on the existing wording and the fact that "1" says "may ignore".
>
> Or was it always intended to be some kind of inverted logic with 0
> meaning that we *must* preserve what FW did ?
>

The original purpose was for firmware running on 64-bit hosts to not
create a PCI resource assignment that was incompatible with the OS
booting in 32-bit mode.

So the expectation was that a 32-bit OS would reuse whatever config
the firmware created, and the 64-bit version would be enlightened to
understand that it could reassign resource assignments to make better
use of the available resource windows, but this required a mechanism
to describe which resources should be left alone by the OS.

So I don't think anybody cares about that use case anymore, and I have
no idea how widespread its use was when people did.

> But preserving what FW did was always the default for x86 and
> Windows... It's just that we happen to do something wrong today on
> Linux/ARM64 which is to always reassign everything.
>
> The way Linux resource assignment works accross platforms has generally
> been based on one of these 3 methods:
>
>  - The standard x86 method, which is to claim what's there when it
> doesn't look completely busted and has been assigned, then assign
> what's left. This allows for FW doing partial assignment, and allows to
> work around a number of BIOS bugs.
>
>  - The "probe only" method. This was created independently on powerpc
> and some other archs afaik. At least for powerpc, the reason for that
> is some interesting virtualization cases where we just cannot touch or
> change or move anything. The effect is to not reassign even what we
> dont like, and not call pci_assign_unassign_resources().
>
>  - The "reassign everything" method. This is used by almost all
> embedded patforms accross archs. All arm32, all arm64 today (but we
> agree that's wrong), all embedded powerpc etc... This is basically
> meant for us not trusting whatever random uboot or other embedded FW,
> if any, and do a full from-scratch assignment. There are issues in how
> that is implemented accross the various platforms/archs, some for
> example still honor existing bus numbers and some don't, but I doubt
> it's intentional etc... but that method is there to stay.
>
> Now, the questions are two fold
>
>   - How do we map _DSM #5 to these, at least on arm64, maybe in the
> long run we can also look at affecting x86, but that's less urgent.
>
>   - How do I ensure the above fixes my Amazon platform ? :-)
>

It would help if you could explain what exactly is wrong with your
Amazon platform :-)

> There's one obvious thing here. If we don't want to break existing
> things, then the absence of _DSM #5 must not change our existing
> behaviour. I think we can all agree on this.
>
> We might explore changing arm64 default behaviour as a second step
> since we all agree it's not doing what it should, but we also know that
> it will probably break some things so we need to be careful, understand
> the issues and work around them. This isn't the scope of the initial
> _DSM #5 patch.
>
> That leaves us with the _DSM #5 present cases.
>
> Now we have two values. What do they mean ? As I already said before,
> the wording with "must not ignore" and "may ignore" is completely
> useless and doesn't tell us a thing about the intention here. We don't
> know why the FW folks may have put a given value here, and what they
> expect us to do about it.
>
> What we do know is that Seattle returns 1 and needs reassignment, and
> we do know that the Amazon platforms return 0 and will want us to not
> touch the existing setup.
>
> However, does 1 means "business as usual" or does it mean "reassign
> everything" ?
>
> Does 0 means "probe only" ? Or do we still do an assignment pass for
> things that the FW may have left unassigned ?
>
> Today in Linux, the "probe only" logic tends to not call
> pci_assign_unassigned_resources for example.
>
> From a pure reading of the spec, there's an argument to be made that
> both 0 and 1 values can lead to the same code that reads what's there
> and reassign what's missing.
>
> So this is a mess, a usual when it comes to specs written by
> committees, but at this stage I'm at a loss as to what you want me to
> do.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14  7:42     ` Ard Biesheuvel
@ 2019-06-14  8:36       ` Benjamin Herrenschmidt
  2019-06-14  9:57         ` Lorenzo Pieralisi
  2019-06-14 13:09         ` Bjorn Helgaas
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-14  8:36 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Bjorn Helgaas, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Fri, 2019-06-14 at 09:42 +0200, Ard Biesheuvel wrote:
> The original purpose was for firmware running on 64-bit hosts to not
> create a PCI resource assignment that was incompatible with the OS
> booting in 32-bit mode.
> 
> So the expectation was that a 32-bit OS would reuse whatever config
> the firmware created, and the 64-bit version would be enlightened to
> understand that it could reassign resource assignments to make better
> use of the available resource windows, but this required a mechanism
> to describe which resources should be left alone by the OS.
> 
> So I don't think anybody cares about that use case anymore, and I have
> no idea how widespread its use was when people did.

Ok. At least my thinkpad happily assigns stuff in 64-bit space. AFAIK
even 32-bit distros can deal with it with PAE no ?

> >   - The "probe only" method. This was created independently on powerpc
> > and some other archs afaik. At least for powerpc, the reason for that
> > is some interesting virtualization cases where we just cannot touch or
> > change or move anything. The effect is to not reassign even what we
> > dont like, and not call pci_assign_unassign_resources().
> > 
> >   - The "reassign everything" method. This is used by almost all
> > embedded patforms accross archs. All arm32, all arm64 today (but we
> > agree that's wrong), all embedded powerpc etc... This is basically
> > meant for us not trusting whatever random uboot or other embedded FW,
> > if any, and do a full from-scratch assignment. There are issues in how
> > that is implemented accross the various platforms/archs, some for
> > example still honor existing bus numbers and some don't, but I doubt
> > it's intentional etc... but that method is there to stay.
> > 
> > Now, the questions are two fold
> > 
> >    - How do we map _DSM #5 to these, at least on arm64, maybe in the
> > long run we can also look at affecting x86, but that's less urgent.
> > 
> >    - How do I ensure the above fixes my Amazon platform ? :-)
> > 
> 
> It would help if you could explain what exactly is wrong with your
> Amazon platform :-)

Linux can't change the switch configuration. I may have mentioned
earlier it has to do with platform sec policies. But that's not totally
relevant, we shoudn't be changing resources anyway since in theory
runtime FW might rely on where some system devices were assigned at
boot. EFI fb is another example of that.

The biggest issue for me right now is that the spec says pretty much at
_DSM #5 = 0 is equivalent to _DSM #5 absent, and Bjorn seems keen on
having it this way, but for arm64, we specifically want to distinguish
those 2 cases.

We want to honor _DSM #5 = 0, and at least initially, leave the rest
alone.

Now, we *also* want to look at switching the rest to the "normal" (for
ACPI platforms at least) mechanism of using what FW setup and fixing up
if necessary, but that's not what the code does today, we know just
switching to pci_bus_claim_resources() will break some platforms, and
we need more testing and possibly quirks to get there, so it's material
for a separate patch.

But in the meantime, I need to differenciate.

Also using "probe_only" for _DSM #5 = 0 isn't a good idea, at least as
implemented today in the rest of the kernel, probe_only also means we
shouldn't assign what was left unassigned. However _DSM #5 allows this.

So we'll need to find some more subtle way to convey these.

Bjorn: At this point, because of all the above, I'm keen on going back
to my original patch (slightly modified Ard's patch), possibly
rewording a thing or two and addressing Lorenzo comment.

We can look at a better and more generic implementation of _DSM #5
including for child nodes after I have consolidated more of the
resource management code.

Looking at the spec (and followup discussions for specs updates), I'm
quite keen on treating _DSM #5 = 1 as "wipe out the resource for that
endpoint/bridge and realloc something better. There are reasons for
that, but we can probably discuss that later.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14  8:36       ` Benjamin Herrenschmidt
@ 2019-06-14  9:57         ` Lorenzo Pieralisi
  2019-06-14 10:36           ` Benjamin Herrenschmidt
  2019-06-14 13:09         ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Pieralisi @ 2019-06-14  9:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci, linux-arm-kernel

On Fri, Jun 14, 2019 at 06:36:32PM +1000, Benjamin Herrenschmidt wrote:

[...]

> The biggest issue for me right now is that the spec says pretty much at
> _DSM #5 = 0 is equivalent to _DSM #5 absent, and Bjorn seems keen on
> having it this way, but for arm64, we specifically want to distinguish
> those 2 cases.
> 
> We want to honor _DSM #5 = 0, and at least initially, leave the rest
> alone.
> 
> Now, we *also* want to look at switching the rest to the "normal" (for
> ACPI platforms at least) mechanism of using what FW setup and fixing up
> if necessary, but that's not what the code does today, we know just
> switching to pci_bus_claim_resources() will break some platforms, and
> we need more testing and possibly quirks to get there, so it's material
> for a separate patch.
> 
> But in the meantime, I need to differenciate.
> 
> Also using "probe_only" for _DSM #5 = 0 isn't a good idea, at least as
> implemented today in the rest of the kernel, probe_only also means we
> shouldn't assign what was left unassigned. However _DSM #5 allows this.

I am not sure about this. PCI_PROBE_ONLY cannot stop an OS from
reassigning BARs that are clearly misconfigured, it does not make
any sense. It can't stop an OS from writing those BARs anyway,
since they must be sized, why firmware would prevent an OS from
reassigning BARs that are programmed with values that can be
deemed 100% bogus ? Or put it differently, why must an OS preserve
those values willy-nilly ?

For me, PCI_PROBE_ONLY and _DSM == 0 on a host bridge must be considered
equivalent.

I agree with Bjorn on his reading of _DSM #5 and I think that
the original patch that claims on _DSM #5 == 0 is a good
starting point. I would like to make it a default even without
_DSM #5 == 0 so that claim and reassign on claim failure works
irrespective of _DSM #5, it is now or never, I think we can give
it a shot, with an incremental patch.

Lorenzo

> So we'll need to find some more subtle way to convey these.
> 
> Bjorn: At this point, because of all the above, I'm keen on going back
> to my original patch (slightly modified Ard's patch), possibly
> rewording a thing or two and addressing Lorenzo comment.
> 
> We can look at a better and more generic implementation of _DSM #5
> including for child nodes after I have consolidated more of the
> resource management code.
> 
> Looking at the spec (and followup discussions for specs updates), I'm
> quite keen on treating _DSM #5 = 1 as "wipe out the resource for that
> endpoint/bridge and realloc something better. There are reasons for
> that, but we can probably discuss that later.
> 
> Cheers,
> Ben.
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14  9:57         ` Lorenzo Pieralisi
@ 2019-06-14 10:36           ` Benjamin Herrenschmidt
  2019-06-14 10:43             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-14 10:36 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Ard Biesheuvel, Bjorn Helgaas, Sinan Kaya, linux-pci, linux-arm-kernel

On Fri, 2019-06-14 at 10:57 +0100, Lorenzo Pieralisi wrote:
> > Also using "probe_only" for _DSM #5 = 0 isn't a good idea, at least as
> > implemented today in the rest of the kernel, probe_only also means we
> > shouldn't assign what was left unassigned. However _DSM #5 allows this.
> 
> I am not sure about this. PCI_PROBE_ONLY cannot stop an OS from
> reassigning BARs that are clearly misconfigured, it does not make
> any sense.

PCI_PROBE_ONLY is a linux thing which, as implemented today, implies no
assignment at all. I believe it originates as a merge of variants of
the same thing, at least one of them being one I created for powerpc
back in the days due to our proprietary hypervisor not letting you
touch any of the PCI config space.

If a device looks broken, disable it, don't use it, but don't reassign
either. At least that the semantics we have today. And as such they
don't match _DSM #5 = 0.

> It can't stop an OS from writing those BARs anyway,
> since they must be sized, why firmware would prevent an OS from
> reassigning BARs that are programmed with values that can be
> deemed 100% bogus ? Or put it differently, why must an OS preserve
> those values willy-nilly ?

Don't ask me ... IBM firmware :-) At least that was the idea back then.

That said I suppose some platforms may also have set that flag to
indicate that they aren't sure what other "ghost" things might be
in the address space, ie Linux doesnt have a clear view of what's
free to allocate devices to for example.

> For me, PCI_PROBE_ONLY and _DSM == 0 on a host bridge must be considered
> equivalent.

Well, that's not what PCI_PROBE_ONLY is today in Linux. It might be
what you would like it to be but it's not what it is :-) And I'd like
to avoid making arm64 different than everybody else here because I want
to consolidate things.

Fundamentally, is what _DSM #5 == 0 does any different from our
standard (not PROBE_ONLY) mode of operation on server platforms anyway
? Ie, we read what's there, and we leave it alone unless it's broken or
unassigned ? This is precisely the definition of _DSM #5 == 0 no ?

PROBE_ONLY is .. something else.

> I agree with Bjorn on his reading of _DSM #5 and I think that
> the original patch that claims on _DSM #5 == 0 is a good
> starting point.

The original patch is a good starting point, we agree. The only point
of disagreement with Bjorn at this stage is what the "default" is in
absence of _DSM #5.

The spec says it should be the same as _DSM #5 == 0, but we know today
it will introduce a much wider ranging change to arm64 to treat it that
way. At the very least, changing the default should be a different
patch.

>  I would like to make it a default even without
> _DSM #5 == 0 so that claim and reassign on claim failure works
> irrespective of _DSM #5, it is now or never, I think we can give
> it a shot, with an incremental patch.

We should. In fact, I was thinking about it on the way home tonight and
was going to ask you and Ard to try this out and send me the debug
level log output of anything that looks wrong on any platform:

diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c
index bb85e2f4603f..0af1f1b4e4d8 100644
--- a/arch/arm64/kernel/pci.c
+++ b/arch/arm64/kernel/pci.c
@@ -193,8 +193,8 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
 	if (!bus)
 		return NULL;
 
-	pci_bus_size_bridges(bus);
-	pci_bus_assign_resources(bus);
+	pci_bus_claim_resources(bus);
+	pci_assign_unassigned_root_bus_resources(bus);
 
 	list_for_each_entry(child, &bus->children, node)
 		pcie_bus_configure_settings(child);

Cheers,
Ben.



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 10:36           ` Benjamin Herrenschmidt
@ 2019-06-14 10:43             ` Benjamin Herrenschmidt
  2019-06-14 13:12               ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-14 10:43 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-pci, Sinan Kaya, Bjorn Helgaas, linux-arm-kernel, Ard Biesheuvel

On Fri, 2019-06-14 at 20:36 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2019-06-14 at 10:57 +0100, Lorenzo Pieralisi wrote:
> > > Also using "probe_only" for _DSM #5 = 0 isn't a good idea, at least as
> > > implemented today in the rest of the kernel, probe_only also means we
> > > shouldn't assign what was left unassigned. However _DSM #5 allows this.
> > 
> > I am not sure about this. PCI_PROBE_ONLY cannot stop an OS from
> > reassigning BARs that are clearly misconfigured, it does not make
> > any sense.
> 
> PCI_PROBE_ONLY is a linux thing which, as implemented today, implies no
> assignment at all. I believe it originates as a merge of variants of
> the same thing, at least one of them being one I created for powerpc
> back in the days due to our proprietary hypervisor not letting you
> touch any of the PCI config space.

Well... you could "touch" things and even BAR size but mostly we don't
even do that on powerpc on these systems these days, we read the BAR
values (and a bunch of other things) from OFW and manufacture the
pci_dev. The generic code still use cfg space to fill up the blanks.
sparc64 uses the same technique.

This least to another conversation we hinted at earlier.. we should
probably have a way to do the same at least for BARs on ACPI systems so
we don't have to temporarily disable access to a device to size them.

This can be problematic is the device in question need to be used
during the sizing. It can happen with some system devices used behind
the scene by FW, or the device on which the console is (how many time
did I crash bcs I had too verbose printk's in the PCI code during BAR
sizing of the framebuffer or the serial card...)

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14  8:36       ` Benjamin Herrenschmidt
  2019-06-14  9:57         ` Lorenzo Pieralisi
@ 2019-06-14 13:09         ` Bjorn Helgaas
  2019-06-14 13:46           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-06-14 13:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Fri, Jun 14, 2019 at 06:36:32PM +1000, Benjamin Herrenschmidt wrote:
> Linux can't change the switch configuration. I may have mentioned
> earlier it has to do with platform sec policies. But that's not totally
> relevant, we shoudn't be changing resources anyway since in theory
> runtime FW might rely on where some system devices were assigned at
> boot. EFI fb is another example of that.

"We shouldn't be changing resources anyway" is not really useful
guidance.  I completely agree that we shouldn't change things
*unnecessarily*, but it's up to the OS to define what makes it
necessary -- it might be for rebalancing for hotplug, to make space
for SR-IOV, to respect user requests to increase alignment, etc.

IMO the real value of _DSM #5 is as a mechanism to advertise
dependencies runtime firmware has on devices, e.g., SMM firmware might
want to log errors to a PCI remote management device.  If the OS moved
that managment device, the SMM logging would itself cause errors.

> The biggest issue for me right now is that the spec says pretty much at
> _DSM #5 = 0 is equivalent to _DSM #5 absent, and Bjorn seems keen on
> having it this way, but for arm64, we specifically want to distinguish
> those 2 cases.

Nope, my opinion is exactly the opposite.  Sorry that I'm not
communicating this clearly.

It's true that the r3.2 spec *does* say _DSM #5 = 0 is equivalent to
the situation where it's absent, but that's based on the assumption
that the OS is never allowed to change PCI configuration.  I think
that assumption is complete nonsense and should be disregarded.

  _DSM #5 = 0: OS must preserve this device's BARs
	       (current spec says "OS must not ignore")

  _DSM #5 = 1: OS *may* change this device's BARs
	       (current spec says "OS may ignore")

Other than _DSM #5, there's no spec I'm aware of that restricts the
OS's ability to change BARs.  Therefore I think "_DSM #5 = 1" is
equivalent to _DSM #5 being absent, and in both cases the OS is free
to change BARs as it sees fit.

> Looking at the spec (and followup discussions for specs updates), I'm
> quite keen on treating _DSM #5 = 1 as "wipe out the resource for that
> endpoint/bridge and realloc something better. There are reasons for
> that, but we can probably discuss that later.

I disagree on the "wipe out all resources" part of this because we
have no idea how to realloc something better.  We should of course
look for problems (overlapping devices, etc) and fix them.  But
starting from scratch and reallocating won't reliably produce anything
different from the original, supposedly broken, configuration.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 10:43             ` Benjamin Herrenschmidt
@ 2019-06-14 13:12               ` Bjorn Helgaas
  2019-06-14 13:48                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-06-14 13:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, linux-arm-kernel,
	Ard Biesheuvel

On Fri, Jun 14, 2019 at 08:43:19PM +1000, Benjamin Herrenschmidt wrote:

> This least to another conversation we hinted at earlier.. we should
> probably have a way to do the same at least for BARs on ACPI systems so
> we don't have to temporarily disable access to a device to size them.

The PCI Enhanced Allocation capability provides a way to do this.  I
don't know how widely used it is, but it's theoretically possible.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 13:09         ` Bjorn Helgaas
@ 2019-06-14 13:46           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-14 13:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ard Biesheuvel, Sinan Kaya, Lorenzo Pieralisi, linux-pci,
	linux-arm-kernel

On Fri, 2019-06-14 at 08:09 -0500, Bjorn Helgaas wrote:
> On Fri, Jun 14, 2019 at 06:36:32PM +1000, Benjamin Herrenschmidt wrote:
> > Linux can't change the switch configuration. I may have mentioned
> > earlier it has to do with platform sec policies. But that's not totally
> > relevant, we shoudn't be changing resources anyway since in theory
> > runtime FW might rely on where some system devices were assigned at
> > boot. EFI fb is another example of that.
> 
> "We shouldn't be changing resources anyway" is not really useful
> guidance.  I completely agree that we shouldn't change things
> *unnecessarily*, but it's up to the OS to define what makes it
> necessary -- it might be for rebalancing for hotplug, to make space
> for SR-IOV, to respect user requests to increase alignment, etc.
> 
> IMO the real value of _DSM #5 is as a mechanism to advertise
> dependencies runtime firmware has on devices, e.g., SMM firmware might
> want to log errors to a PCI remote management device.  If the OS moved
> that managment device, the SMM logging would itself cause errors.

This I agree with, though that wasn't specifically why it was created
but this is where it's going yes.

However, I think the resource management code in Linux is some way from
being able to handle that at device granularity. It's actually a very
difficult problem to solve in an optimal way.

For example you could have a bridge that can be moved but a child of
that bridge is fixed. That means that now, the bridge must only be
moved within the constraints that it still contains that child.

This in turn is very tricky to properly define because the bridge can
have other children who aren't fixed, but whose alignment constraints
will also limit where the bridge can go.

I don't think we want to practically create such a mechanism, we have
to be realistic. So that means if something needs to be fixed, all of
its parents need too. Due to how the spec is written, if that _DSM #5
== 0 object is a bridge, then both all of its parents and all of its
descendants are now fixed (except hotplugged descendants), unless said
descendants are themselves tagged with _DSM #5 == 1. But then, as I
explain below, this does more than just "cancel" _DSM #5 == 0 ... what
a mess :-)

That said, I have been thinking about ways we could better honor that
_DSM #5 at various levels of the tree, and I think it's doable. There's
going to still be some arguments to sort out about policy details as
above, and we might end up trying to get the SIG to clarify things
further but I think we can get somewhere reasonably sane in practice
once we've changed arm64 to use an x86-style allocation rather than
trying to do everything from scratch. Working on it ...

> > The biggest issue for me right now is that the spec says pretty much at
> > _DSM #5 = 0 is equivalent to _DSM #5 absent, and Bjorn seems keen on
> > having it this way, but for arm64, we specifically want to distinguish
> > those 2 cases.
> 
> Nope, my opinion is exactly the opposite.  Sorry that I'm not
> communicating this clearly.
> 
> It's true that the r3.2 spec *does* say _DSM #5 = 0 is equivalent to
> the situation where it's absent, but that's based on the assumption
> that the OS is never allowed to change PCI configuration.  I think
> that assumption is complete nonsense and should be disregarded.
> 
>   _DSM #5 = 0: OS must preserve this device's BARs
> 	       (current spec says "OS must not ignore")
> 
>   _DSM #5 = 1: OS *may* change this device's BARs
> 	       (current spec says "OS may ignore")
> 
> Other than _DSM #5, there's no spec I'm aware of that restricts the
> OS's ability to change BARs.  Therefore I think "_DSM #5 = 1" is
> equivalent to _DSM #5 being absent, and in both cases the OS is free
> to change BARs as it sees fit.

I think we are conflating too many different things together here, it's
not binary.

If you look at the reasons why DSM#5 was created in the first place, it
had to do with firmwares setting up a 32-bit compatible layout which is
suboptimal for 64-bit, and thus marking with DSM #1 things that could
(and probably should) be reallocated elsewhere.

Based on that pattern, it's tempting (and I'm tempted...) to handle
_DSM #5 == 1, at least at a non-root-bridge level, by wiping the
resources (or setting IORESOURCE_UNSET), to force Linux to create a
potentially more optimal allocation.

However, that shouldn't be our default either :-)

So my personal opinion is that DSM #5 0, 1 and absent are actually 3
different things.

> > Looking at the spec (and followup discussions for specs updates), I'm
> > quite keen on treating _DSM #5 = 1 as "wipe out the resource for that
> > endpoint/bridge and realloc something better. There are reasons for
> > that, but we can probably discuss that later.
> 
> I disagree on the "wipe out all resources" part of this because we
> have no idea how to realloc something better.  We should of course
> look for problems (overlapping devices, etc) and fix them.  But
> starting from scratch and reallocating won't reliably produce anything
> different from the original, supposedly broken, configuration.

Well, again, it depends *why* _DSM #5 was set to 1 in the first place.
As I said above, there are actually more than 2 cases here and _DSM is
trying to convey too much with too little information. We lack the
firmware intent, it's not being conveyed well.

From my understanding of the spec history, _DSM #5 == 1 was
specifically created because the FW knew it was setting up some
resources in a sub optimal way. The 32-bit example is spelled out. I
can imagine others, such as FW whacking a fixed BAR value on a boot
device to get going knowing full well it's sub optimal.

Now unless we start building some pretty serious smarts into our
allocation scheme to "decide" whether a valid resource could be done
better elsewhere or not, which we aren't near being able to do I
suspect, that case will not work *unless* we just wipe such resources
and let Linux do what it thinks is best with it.

All those interpretations are valid on either the current or the
"proposed" spec, and it still leaves us in a bind because there is not
enough intent being conveyed. It needs to be more than a binary choice
ideally.

That said, I'm not in a hurry to deal with the _DSM #5 == 1 case, so we
can defer that conversation. Maybe we can create a mechanism where we
do a "dummy run" of our allocation code on a snapshot of resources with
the assumption that _DSM #5 == 1 ones get wiped, and create some
metrics to decide whether the end result is "better" than what's
already there, and based on that chose what to apply, but it sounds
really really messy.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 13:12               ` Bjorn Helgaas
@ 2019-06-14 13:48                 ` Benjamin Herrenschmidt
  2019-06-14 20:13                   ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-14 13:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, linux-arm-kernel,
	Ard Biesheuvel

On Fri, 2019-06-14 at 08:12 -0500, Bjorn Helgaas wrote:
> On Fri, Jun 14, 2019 at 08:43:19PM +1000, Benjamin Herrenschmidt
> wrote:
> 
> > This least to another conversation we hinted at earlier.. we should
> > probably have a way to do the same at least for BARs on ACPI
> > systems so
> > we don't have to temporarily disable access to a device to size
> > them.
> 
> The PCI Enhanced Allocation capability provides a way to do this.  I
> don't know how widely used it is, but it's theoretically possible.

Ok, I have to read about this. I haven't seen a device with that on
yet, it looks messy at a quick glance.

Can ACPI convey the information ? On powerpc and sparc64 we have ways
to read the BAR values from the device-tree created by OF when it
assigned them.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 13:48                 ` Benjamin Herrenschmidt
@ 2019-06-14 20:13                   ` Bjorn Helgaas
  2019-06-15  1:18                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2019-06-14 20:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, linux-arm-kernel,
	Ard Biesheuvel

On Fri, Jun 14, 2019 at 11:48:18PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2019-06-14 at 08:12 -0500, Bjorn Helgaas wrote:
> > On Fri, Jun 14, 2019 at 08:43:19PM +1000, Benjamin Herrenschmidt
> > wrote:
> > 
> > > This least to another conversation we hinted at earlier.. we should
> > > probably have a way to do the same at least for BARs on ACPI
> > > systems so
> > > we don't have to temporarily disable access to a device to size
> > > them.
> > 
> > The PCI Enhanced Allocation capability provides a way to do this.  I
> > don't know how widely used it is, but it's theoretically possible.
> 
> Ok, I have to read about this. I haven't seen a device with that on
> yet, it looks messy at a quick glance.
> 
> Can ACPI convey the information ? On powerpc and sparc64 we have ways
> to read the BAR values from the device-tree created by OF when it
> assigned them.

I agree, EA is messy.

I don't think it's feasible to do this in ACPI.  It's a pretty
fundamental principle of PCI that you can discover what resources a
device needs and uses by looking at its config space.  In general PCIe
requires ECAM, which gives the OS direct access to config space,
although it does allow exceptions for architecture-specific firmware
interfaces for accessing config space, e.g., ia64 SAL (PCIe r4.0, sec
7.2.2).

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup
  2019-06-14 20:13                   ` Bjorn Helgaas
@ 2019-06-15  1:18                     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2019-06-15  1:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lorenzo Pieralisi, linux-pci, Sinan Kaya, linux-arm-kernel,
	Ard Biesheuvel

On Fri, 2019-06-14 at 15:13 -0500, Bjorn Helgaas wrote:

> Ok, I have to read about this. I haven't seen a device with that on
> > yet, it looks messy at a quick glance.
> > 
> > Can ACPI convey the information ? On powerpc and sparc64 we have ways
> > to read the BAR values from the device-tree created by OF when it
> > assigned them.
> 
> I agree, EA is messy.
> 
> I don't think it's feasible to do this in ACPI.  It's a pretty
> fundamental principle of PCI that you can discover what resources a
> device needs and uses by looking at its config space.  In general PCIe
> requires ECAM, which gives the OS direct access to config space,
> although it does allow exceptions for architecture-specific firmware
> interfaces for accessing config space, e.g., ia64 SAL (PCIe r4.0, sec
> 7.2.2).

So this isn't something I need, but if others do, we can find a
reasonable compromise here and push it to the spec. It's actually
fairly easy:

If a device is used by FW (SMM, SMCCC or whatever other runtime thingy)
to the extent that temporarily disabling it for BAR sizing can cause
random boot failures (if the wrong event happens at the wrong time), it
would be easy for FW to "mark" that device as such (_DSM #5 == 2 ? just
kidding...) and provide some forms of properties/datas that expose the
resources that were assigned. On OF, the properties for that already
exist, so just adding something like "no-bar-sizing" or such in the
node for the device would do.

It's easy because FW only has to "represent" endpoints that have such
properties and leave everything else to the OS. There is no need to
mandate a full representation of all PCI devices.

There are a few details to be careful of, for example, if any bridge in
the parent chain of such an endpoint has BARs (not windows, actual
BARs), then they should have those properties too, otherwise sizing
them will temporarily disable the path to the device since BAR sizing
should be done with memory/io decode off.

But otherwise, it's a pretty trivial thing to specify and implement I
suspect. A lot easier than requiring HW to implement EA is my gut
feeling :-)

As I said, I don't have a pressing need for that now (could have come
in handy back in the powermac days ... oh well). But if enough people
do, I'm happy to help ironing something out.

Cheers,
Ben




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-06-15  1:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-13  7:54 [RFC PATCH v2] arm64: acpi/pci: invoke _DSM whether to preserve firmware PCI setup Benjamin Herrenschmidt
2019-06-13 19:02 ` Bjorn Helgaas
2019-06-13 21:59   ` Benjamin Herrenschmidt
2019-06-13 23:07   ` Benjamin Herrenschmidt
2019-06-14  7:42     ` Ard Biesheuvel
2019-06-14  8:36       ` Benjamin Herrenschmidt
2019-06-14  9:57         ` Lorenzo Pieralisi
2019-06-14 10:36           ` Benjamin Herrenschmidt
2019-06-14 10:43             ` Benjamin Herrenschmidt
2019-06-14 13:12               ` Bjorn Helgaas
2019-06-14 13:48                 ` Benjamin Herrenschmidt
2019-06-14 20:13                   ` Bjorn Helgaas
2019-06-15  1:18                     ` Benjamin Herrenschmidt
2019-06-14 13:09         ` Bjorn Helgaas
2019-06-14 13:46           ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).