linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] ACPI/IORT: rework num_ids off-by-one quirk
@ 2020-05-01  9:58 Ard Biesheuvel
  2020-05-01  9:58 ` [PATCH RFC 1/2] Revert "ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()" Ard Biesheuvel
  2020-05-01  9:58 ` [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity Ard Biesheuvel
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01  9:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, Ard Biesheuvel, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas,
	Robin Murphy

Replace the ACPI OEM ID matching based IORT quirk for the ID region size
ambiguity with runtime handling of this condition.

This is based on the observation that we only care about this when it
causes ambiguity regarding the output reference, which means that we
will have more than one match for the input ID. In this case, we can
just disregard the one at the end of a multi-ID region: if we hit it
first, we record it tentatively but allow a subsequent match to
supersede it. If we hit the correct match first, there is nothing we
need to do.

Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Pankaj Bansal <pankaj.bansal@nxp.com>
Cc: Will Deacon <will@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>

Ard Biesheuvel (2):
  Revert "ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()"
  ACPI/IORT: work around num_ids ambiguity

 drivers/acpi/arm64/iort.c | 78 +++++---------------
 1 file changed, 19 insertions(+), 59 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RFC 1/2] Revert "ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()"
  2020-05-01  9:58 [PATCH RFC 0/2] ACPI/IORT: rework num_ids off-by-one quirk Ard Biesheuvel
@ 2020-05-01  9:58 ` Ard Biesheuvel
  2020-05-01  9:58 ` [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity Ard Biesheuvel
  1 sibling, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01  9:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, Ard Biesheuvel, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas,
	Robin Murphy

This reverts commit 3c23b83a88d00383e1d498cfa515249aa2fe0238.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/acpi/arm64/iort.c | 57 +-------------------
 1 file changed, 2 insertions(+), 55 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 7d04424189df..98be18266a73 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -299,59 +299,6 @@ static acpi_status iort_match_node_callback(struct acpi_iort_node *node,
 	return status;
 }
 
-struct iort_workaround_oem_info {
-	char oem_id[ACPI_OEM_ID_SIZE + 1];
-	char oem_table_id[ACPI_OEM_TABLE_ID_SIZE + 1];
-	u32 oem_revision;
-};
-
-static bool apply_id_count_workaround;
-
-static struct iort_workaround_oem_info wa_info[] __initdata = {
-	{
-		.oem_id		= "HISI  ",
-		.oem_table_id	= "HIP07   ",
-		.oem_revision	= 0,
-	}, {
-		.oem_id		= "HISI  ",
-		.oem_table_id	= "HIP08   ",
-		.oem_revision	= 0,
-	}
-};
-
-static void __init
-iort_check_id_count_workaround(struct acpi_table_header *tbl)
-{
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(wa_info); i++) {
-		if (!memcmp(wa_info[i].oem_id, tbl->oem_id, ACPI_OEM_ID_SIZE) &&
-		    !memcmp(wa_info[i].oem_table_id, tbl->oem_table_id, ACPI_OEM_TABLE_ID_SIZE) &&
-		    wa_info[i].oem_revision == tbl->oem_revision) {
-			apply_id_count_workaround = true;
-			pr_warn(FW_BUG "ID count for ID mapping entry is wrong, applying workaround\n");
-			break;
-		}
-	}
-}
-
-static inline u32 iort_get_map_max(struct acpi_iort_id_mapping *map)
-{
-	u32 map_max = map->input_base + map->id_count;
-
-	/*
-	 * The IORT specification revision D (Section 3, table 4, page 9) says
-	 * Number of IDs = The number of IDs in the range minus one, but the
-	 * IORT code ignored the "minus one", and some firmware did that too,
-	 * so apply a workaround here to keep compatible with both the spec
-	 * compliant and non-spec compliant firmwares.
-	 */
-	if (apply_id_count_workaround)
-		map_max--;
-
-	return map_max;
-}
-
 static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
 		       u32 *rid_out)
 {
@@ -368,7 +315,8 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
 		return -ENXIO;
 	}
 
-	if (rid_in < map->input_base || rid_in > iort_get_map_max(map))
+	if (rid_in < map->input_base ||
+	    (rid_in >= map->input_base + map->id_count))
 		return -ENXIO;
 
 	*rid_out = map->output_base + (rid_in - map->input_base);
@@ -1703,6 +1651,5 @@ void __init acpi_iort_init(void)
 		return;
 	}
 
-	iort_check_id_count_workaround(iort_table);
 	iort_init_platform_devices();
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01  9:58 [PATCH RFC 0/2] ACPI/IORT: rework num_ids off-by-one quirk Ard Biesheuvel
  2020-05-01  9:58 ` [PATCH RFC 1/2] Revert "ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()" Ard Biesheuvel
@ 2020-05-01  9:58 ` Ard Biesheuvel
  2020-05-01 10:55   ` Robin Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01  9:58 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-acpi, Ard Biesheuvel, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas,
	Robin Murphy

The ID mapping table structure of the IORT table describes the size of
a range using a num_ids field carrying the number of IDs in the region
minus one. This has been misinterpreted in the past in the parsing code,
and firmware is known to have shipped where this results in an ambiguity,
where regions that should be adjacent have an overlap of one value.

So let's work around this by detecting this case specifically: when
resolving an ID translation, allow one that matches right at the end of
a multi-ID region to be superseded by a subsequent one.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 98be18266a73..d826dd9dc4c5 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
 	}
 
 	if (rid_in < map->input_base ||
-	    (rid_in >= map->input_base + map->id_count))
+	    (rid_in > map->input_base + map->id_count))
 		return -ENXIO;
 
 	*rid_out = map->output_base + (rid_in - map->input_base);
+
+	/*
+	 * Due to confusion regarding the meaning of the id_count field (which
+	 * carries the number of IDs *minus 1*), we may have to disregard this
+	 * match if it is at the end of the range, and overlaps with the start
+	 * of another one.
+	 */
+	if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
+		return -EAGAIN;
 	return 0;
 }
 
@@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
 	/* Parse the ID mapping tree to find specified node type */
 	while (node) {
 		struct acpi_iort_id_mapping *map;
-		int i, index;
+		int i, index, rc = 0;
+		u32 out_ref = 0, map_id = id;
 
 		if (IORT_TYPE_MASK(node->type) & type_mask) {
 			if (id_out)
@@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
 			if (i == index)
 				continue;
 
-			if (!iort_id_map(map, node->type, id, &id))
+			rc = iort_id_map(map, node->type, map_id, &id);
+			if (!rc)
 				break;
+			if (rc == -EAGAIN)
+				out_ref = map->output_reference;
 		}
 
-		if (i == node->mapping_count)
+		if (i == node->mapping_count && rc != -EAGAIN)
 			goto fail_map;
 
 		node = ACPI_ADD_PTR(struct acpi_iort_node, iort_table,
-				    map->output_reference);
+				    rc ? out_ref : map->output_reference);
 	}
 
 fail_map:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01  9:58 ` [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity Ard Biesheuvel
@ 2020-05-01 10:55   ` Robin Murphy
  2020-05-01 11:41     ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2020-05-01 10:55 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-acpi, Hanjun Guo, Lorenzo Pieralisi, Pankaj Bansal,
	Will Deacon, Sudeep Holla, Catalin Marinas

On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
> The ID mapping table structure of the IORT table describes the size of
> a range using a num_ids field carrying the number of IDs in the region
> minus one. This has been misinterpreted in the past in the parsing code,
> and firmware is known to have shipped where this results in an ambiguity,
> where regions that should be adjacent have an overlap of one value.
> 
> So let's work around this by detecting this case specifically: when
> resolving an ID translation, allow one that matches right at the end of
> a multi-ID region to be superseded by a subsequent one.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>   drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
>   1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 98be18266a73..d826dd9dc4c5 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
>   	}
>   
>   	if (rid_in < map->input_base ||
> -	    (rid_in >= map->input_base + map->id_count))
> +	    (rid_in > map->input_base + map->id_count))
>   		return -ENXIO;
>   
>   	*rid_out = map->output_base + (rid_in - map->input_base);
> +
> +	/*
> +	 * Due to confusion regarding the meaning of the id_count field (which
> +	 * carries the number of IDs *minus 1*), we may have to disregard this
> +	 * match if it is at the end of the range, and overlaps with the start
> +	 * of another one.
> +	 */
> +	if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
> +		return -EAGAIN;
>   	return 0;
>   }
>   
> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>   	/* Parse the ID mapping tree to find specified node type */
>   	while (node) {
>   		struct acpi_iort_id_mapping *map;
> -		int i, index;
> +		int i, index, rc = 0;
> +		u32 out_ref = 0, map_id = id;
>   
>   		if (IORT_TYPE_MASK(node->type) & type_mask) {
>   			if (id_out)
> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>   			if (i == index)
>   				continue;
>   
> -			if (!iort_id_map(map, node->type, id, &id))
> +			rc = iort_id_map(map, node->type, map_id, &id);
> +			if (!rc)
>   				break;

This needs a big FW_BUG splat in the case where it did find an overlap. 
Ideally we'd also enforce that the other half of must be the first entry 
of another range, but perhaps we're into diminishing returns by that point.

If we silently fix things up, then people will continue to write broken 
tables without even realising, new OSes will have to implement the same 
mechanism because vendors will have little interest in changing things 
that have worked "correctly" with Linux for years, and we've effectively 
achieved a de-facto redefinition of the spec. Making our end of the 
interface robust is obviously desirable, but there still needs to be 
*some* incentive for the folks on the other end to get it right.

Robin.

> +			if (rc == -EAGAIN)
> +				out_ref = map->output_reference;
>   		}
>   
> -		if (i == node->mapping_count)
> +		if (i == node->mapping_count && rc != -EAGAIN)
>   			goto fail_map;
>   
>   		node = ACPI_ADD_PTR(struct acpi_iort_node, iort_table,
> -				    map->output_reference);
> +				    rc ? out_ref : map->output_reference);
>   	}
>   
>   fail_map:
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 10:55   ` Robin Murphy
@ 2020-05-01 11:41     ` Ard Biesheuvel
  2020-05-01 12:31       ` Robin Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01 11:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
> > The ID mapping table structure of the IORT table describes the size of
> > a range using a num_ids field carrying the number of IDs in the region
> > minus one. This has been misinterpreted in the past in the parsing code,
> > and firmware is known to have shipped where this results in an ambiguity,
> > where regions that should be adjacent have an overlap of one value.
> >
> > So let's work around this by detecting this case specifically: when
> > resolving an ID translation, allow one that matches right at the end of
> > a multi-ID region to be superseded by a subsequent one.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >   drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
> >   1 file changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 98be18266a73..d826dd9dc4c5 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
> >       }
> >
> >       if (rid_in < map->input_base ||
> > -         (rid_in >= map->input_base + map->id_count))
> > +         (rid_in > map->input_base + map->id_count))
> >               return -ENXIO;
> >
> >       *rid_out = map->output_base + (rid_in - map->input_base);
> > +
> > +     /*
> > +      * Due to confusion regarding the meaning of the id_count field (which
> > +      * carries the number of IDs *minus 1*), we may have to disregard this
> > +      * match if it is at the end of the range, and overlaps with the start
> > +      * of another one.
> > +      */
> > +     if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
> > +             return -EAGAIN;
> >       return 0;
> >   }
> >
> > @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >       /* Parse the ID mapping tree to find specified node type */
> >       while (node) {
> >               struct acpi_iort_id_mapping *map;
> > -             int i, index;
> > +             int i, index, rc = 0;
> > +             u32 out_ref = 0, map_id = id;
> >
> >               if (IORT_TYPE_MASK(node->type) & type_mask) {
> >                       if (id_out)
> > @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >                       if (i == index)
> >                               continue;
> >
> > -                     if (!iort_id_map(map, node->type, id, &id))
> > +                     rc = iort_id_map(map, node->type, map_id, &id);
> > +                     if (!rc)
> >                               break;
>
> This needs a big FW_BUG splat in the case where it did find an overlap.

Sure, although we did help create the problem in the first place.

> Ideally we'd also enforce that the other half of must be the first entry
> of another range, but perhaps we're into diminishing returns by that point.
>

That would mean the regions overlap regardless of whether you
interpret num_ids correctly or not, which means we'll be doing
validation of general well-formedness of the table rather than
providing a workaround for this particular issue.

I think the fact that we got it wrong initially justifies treating the
off-by-one case specially, but beyond that, we should make it robust
without being pedantic imo.

> If we silently fix things up, then people will continue to write broken
> tables without even realising, new OSes will have to implement the same
> mechanism because vendors will have little interest in changing things
> that have worked "correctly" with Linux for years, and we've effectively
> achieved a de-facto redefinition of the spec. Making our end of the
> interface robust is obviously desirable, but there still needs to be
> *some* incentive for the folks on the other end to get it right.
>

Agreed. But at least we'll be able to detect it and flag it in the
general case, rather than having a special case for D05/06 only
(although I suppose splitting the output ranges like those platforms
do is rather unusual)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 11:41     ` Ard Biesheuvel
@ 2020-05-01 12:31       ` Robin Murphy
  2020-05-01 13:10         ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2020-05-01 12:31 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On 2020-05-01 12:41 pm, Ard Biesheuvel wrote:
> On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
>>> The ID mapping table structure of the IORT table describes the size of
>>> a range using a num_ids field carrying the number of IDs in the region
>>> minus one. This has been misinterpreted in the past in the parsing code,
>>> and firmware is known to have shipped where this results in an ambiguity,
>>> where regions that should be adjacent have an overlap of one value.
>>>
>>> So let's work around this by detecting this case specifically: when
>>> resolving an ID translation, allow one that matches right at the end of
>>> a multi-ID region to be superseded by a subsequent one.
>>>
>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>> ---
>>>    drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
>>>    1 file changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>>> index 98be18266a73..d826dd9dc4c5 100644
>>> --- a/drivers/acpi/arm64/iort.c
>>> +++ b/drivers/acpi/arm64/iort.c
>>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
>>>        }
>>>
>>>        if (rid_in < map->input_base ||
>>> -         (rid_in >= map->input_base + map->id_count))
>>> +         (rid_in > map->input_base + map->id_count))
>>>                return -ENXIO;
>>>
>>>        *rid_out = map->output_base + (rid_in - map->input_base);
>>> +
>>> +     /*
>>> +      * Due to confusion regarding the meaning of the id_count field (which
>>> +      * carries the number of IDs *minus 1*), we may have to disregard this
>>> +      * match if it is at the end of the range, and overlaps with the start
>>> +      * of another one.
>>> +      */
>>> +     if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
>>> +             return -EAGAIN;
>>>        return 0;
>>>    }
>>>
>>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>>>        /* Parse the ID mapping tree to find specified node type */
>>>        while (node) {
>>>                struct acpi_iort_id_mapping *map;
>>> -             int i, index;
>>> +             int i, index, rc = 0;
>>> +             u32 out_ref = 0, map_id = id;
>>>
>>>                if (IORT_TYPE_MASK(node->type) & type_mask) {
>>>                        if (id_out)
>>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>>>                        if (i == index)
>>>                                continue;
>>>
>>> -                     if (!iort_id_map(map, node->type, id, &id))
>>> +                     rc = iort_id_map(map, node->type, map_id, &id);
>>> +                     if (!rc)
>>>                                break;
>>
>> This needs a big FW_BUG splat in the case where it did find an overlap.
> 
> Sure, although we did help create the problem in the first place.
> 
>> Ideally we'd also enforce that the other half of must be the first entry
>> of another range, but perhaps we're into diminishing returns by that point.
>>
> 
> That would mean the regions overlap regardless of whether you
> interpret num_ids correctly or not, which means we'll be doing
> validation of general well-formedness of the table rather than
> providing a workaround for this particular issue.

The point was to limit any change in behaviour to the specific case that 
we need to work around. Otherwise a table that was entirely malformed 
rather than just off-by-one on the sizes might go from happening-to-work 
to not working, or vice versa; the diminishing return is in how much we 
care about that.

> I think the fact that we got it wrong initially justifies treating the
> off-by-one case specially, but beyond that, we should make it robust
> without being pedantic imo.

As the #1 search engine hit for "Linux is not a firmware validation 
suite", I can reassure you that we're on the same page in that regard ;)

>> If we silently fix things up, then people will continue to write broken
>> tables without even realising, new OSes will have to implement the same
>> mechanism because vendors will have little interest in changing things
>> that have worked "correctly" with Linux for years, and we've effectively
>> achieved a de-facto redefinition of the spec. Making our end of the
>> interface robust is obviously desirable, but there still needs to be
>> *some* incentive for the folks on the other end to get it right.
>>
> 
> Agreed. But at least we'll be able to detect it and flag it in the
> general case, rather than having a special case for D05/06 only
> (although I suppose splitting the output ranges like those platforms
> do is rather unusual)

Yup, in principle the fixed quirk list gives a nice reassuring sense of 
"we'll work around these early platforms and everyone from now on will 
get it right", but whether reality plays out that way is another matter 
entirely...

Robin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 12:31       ` Robin Murphy
@ 2020-05-01 13:10         ` Ard Biesheuvel
  2020-05-01 13:49           ` Lorenzo Pieralisi
  2020-05-01 14:13           ` Robin Murphy
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01 13:10 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-05-01 12:41 pm, Ard Biesheuvel wrote:
> > On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
> >>> The ID mapping table structure of the IORT table describes the size of
> >>> a range using a num_ids field carrying the number of IDs in the region
> >>> minus one. This has been misinterpreted in the past in the parsing code,
> >>> and firmware is known to have shipped where this results in an ambiguity,
> >>> where regions that should be adjacent have an overlap of one value.
> >>>
> >>> So let's work around this by detecting this case specifically: when
> >>> resolving an ID translation, allow one that matches right at the end of
> >>> a multi-ID region to be superseded by a subsequent one.
> >>>
> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>> ---
> >>>    drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
> >>>    1 file changed, 18 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >>> index 98be18266a73..d826dd9dc4c5 100644
> >>> --- a/drivers/acpi/arm64/iort.c
> >>> +++ b/drivers/acpi/arm64/iort.c
> >>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
> >>>        }
> >>>
> >>>        if (rid_in < map->input_base ||
> >>> -         (rid_in >= map->input_base + map->id_count))
> >>> +         (rid_in > map->input_base + map->id_count))
> >>>                return -ENXIO;
> >>>
> >>>        *rid_out = map->output_base + (rid_in - map->input_base);
> >>> +
> >>> +     /*
> >>> +      * Due to confusion regarding the meaning of the id_count field (which
> >>> +      * carries the number of IDs *minus 1*), we may have to disregard this
> >>> +      * match if it is at the end of the range, and overlaps with the start
> >>> +      * of another one.
> >>> +      */
> >>> +     if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
> >>> +             return -EAGAIN;
> >>>        return 0;
> >>>    }
> >>>
> >>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >>>        /* Parse the ID mapping tree to find specified node type */
> >>>        while (node) {
> >>>                struct acpi_iort_id_mapping *map;
> >>> -             int i, index;
> >>> +             int i, index, rc = 0;
> >>> +             u32 out_ref = 0, map_id = id;
> >>>
> >>>                if (IORT_TYPE_MASK(node->type) & type_mask) {
> >>>                        if (id_out)
> >>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >>>                        if (i == index)
> >>>                                continue;
> >>>
> >>> -                     if (!iort_id_map(map, node->type, id, &id))
> >>> +                     rc = iort_id_map(map, node->type, map_id, &id);
> >>> +                     if (!rc)
> >>>                                break;
> >>
> >> This needs a big FW_BUG splat in the case where it did find an overlap.
> >
> > Sure, although we did help create the problem in the first place.
> >
> >> Ideally we'd also enforce that the other half of must be the first entry
> >> of another range, but perhaps we're into diminishing returns by that point.
> >>
> >
> > That would mean the regions overlap regardless of whether you
> > interpret num_ids correctly or not, which means we'll be doing
> > validation of general well-formedness of the table rather than
> > providing a workaround for this particular issue.
>
> The point was to limit any change in behaviour to the specific case that
> we need to work around. Otherwise a table that was entirely malformed
> rather than just off-by-one on the sizes might go from happening-to-work
> to not working, or vice versa; the diminishing return is in how much we
> care about that.
>

I see. I think it is quite unlikely that a working system with
overlapping ID ranges would work, and suddenly fail horribly when the
exact point of overlap is shifted by 1. But yeah, I see your point.

> > I think the fact that we got it wrong initially justifies treating the
> > off-by-one case specially, but beyond that, we should make it robust
> > without being pedantic imo.
>
> As the #1 search engine hit for "Linux is not a firmware validation
> suite", I can reassure you that we're on the same page in that regard ;)
>

Good :-)

> >> If we silently fix things up, then people will continue to write broken
> >> tables without even realising, new OSes will have to implement the same
> >> mechanism because vendors will have little interest in changing things
> >> that have worked "correctly" with Linux for years, and we've effectively
> >> achieved a de-facto redefinition of the spec. Making our end of the
> >> interface robust is obviously desirable, but there still needs to be
> >> *some* incentive for the folks on the other end to get it right.
> >>
> >
> > Agreed. But at least we'll be able to detect it and flag it in the
> > general case, rather than having a special case for D05/06 only
> > (although I suppose splitting the output ranges like those platforms
> > do is rather unusual)
>
> Yup, in principle the fixed quirk list gives a nice reassuring sense of
> "we'll work around these early platforms and everyone from now on will
> get it right", but whether reality plays out that way is another matter
> entirely...

The reason I am looking into this is that I think the fix should go to
stable, given that the current situation makes it impossible to write
firmware that works with older and newer kernels.

Lorenzo said he wouldn't mind taking the current version with ACPI OEM
ID matching back to -stable, but it's another quirk list to manage,
which I would prefer to avoid.

But I don't care deeply either way, to be honest, as long as we can
get something backported so compliant firmware is not being penalized
anymore.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 13:10         ` Ard Biesheuvel
@ 2020-05-01 13:49           ` Lorenzo Pieralisi
  2020-05-01 14:35             ` Ard Biesheuvel
  2020-05-01 14:13           ` Robin Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Lorenzo Pieralisi @ 2020-05-01 13:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Robin Murphy, Linux ARM, ACPI Devel Maling List, Hanjun Guo,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote:

[...]

> > >> If we silently fix things up, then people will continue to write broken
> > >> tables without even realising, new OSes will have to implement the same
> > >> mechanism because vendors will have little interest in changing things
> > >> that have worked "correctly" with Linux for years, and we've effectively
> > >> achieved a de-facto redefinition of the spec. Making our end of the
> > >> interface robust is obviously desirable, but there still needs to be
> > >> *some* incentive for the folks on the other end to get it right.
> > >>
> > >
> > > Agreed. But at least we'll be able to detect it and flag it in the
> > > general case, rather than having a special case for D05/06 only
> > > (although I suppose splitting the output ranges like those platforms
> > > do is rather unusual)
> >
> > Yup, in principle the fixed quirk list gives a nice reassuring sense of
> > "we'll work around these early platforms and everyone from now on will
> > get it right", but whether reality plays out that way is another matter
> > entirely...
> 
> The reason I am looking into this is that I think the fix should go to
> stable, given that the current situation makes it impossible to write
> firmware that works with older and newer kernels.

Yes. If we do remove the quirk the sooner we do it the better to
reduce the stable patches.

> Lorenzo said he wouldn't mind taking the current version with ACPI OEM
> ID matching back to -stable, but it's another quirk list to manage,
> which I would prefer to avoid.
> 
> But I don't care deeply either way, to be honest, as long as we can
> get something backported so compliant firmware is not being penalized
> anymore.

Question: if we remove the iort_workaround_oem_info stuff but we *do*
keep the existing apply_id_count_workaround flag and we set it by going
through all the mappings at boot time and detect if any of these
off-by-one conditions apply - would the resulting code be any simpler ?

The global flag would apply (as it does now) to _all_ mappings but it is
very likely that if the off-by-one firmware bug is present it applies to
the IORT table as a whole rather than a single mapping entry.

Lorenzo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 13:10         ` Ard Biesheuvel
  2020-05-01 13:49           ` Lorenzo Pieralisi
@ 2020-05-01 14:13           ` Robin Murphy
  2020-05-01 14:26             ` Ard Biesheuvel
  1 sibling, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2020-05-01 14:13 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On 2020-05-01 2:10 pm, Ard Biesheuvel wrote:
> On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> On 2020-05-01 12:41 pm, Ard Biesheuvel wrote:
>>> On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
>>>>> The ID mapping table structure of the IORT table describes the size of
>>>>> a range using a num_ids field carrying the number of IDs in the region
>>>>> minus one. This has been misinterpreted in the past in the parsing code,
>>>>> and firmware is known to have shipped where this results in an ambiguity,
>>>>> where regions that should be adjacent have an overlap of one value.
>>>>>
>>>>> So let's work around this by detecting this case specifically: when
>>>>> resolving an ID translation, allow one that matches right at the end of
>>>>> a multi-ID region to be superseded by a subsequent one.
>>>>>
>>>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>>>>> ---
>>>>>     drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
>>>>>     1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>>>>> index 98be18266a73..d826dd9dc4c5 100644
>>>>> --- a/drivers/acpi/arm64/iort.c
>>>>> +++ b/drivers/acpi/arm64/iort.c
>>>>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
>>>>>         }
>>>>>
>>>>>         if (rid_in < map->input_base ||
>>>>> -         (rid_in >= map->input_base + map->id_count))
>>>>> +         (rid_in > map->input_base + map->id_count))
>>>>>                 return -ENXIO;
>>>>>
>>>>>         *rid_out = map->output_base + (rid_in - map->input_base);
>>>>> +
>>>>> +     /*
>>>>> +      * Due to confusion regarding the meaning of the id_count field (which
>>>>> +      * carries the number of IDs *minus 1*), we may have to disregard this
>>>>> +      * match if it is at the end of the range, and overlaps with the start
>>>>> +      * of another one.
>>>>> +      */
>>>>> +     if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
>>>>> +             return -EAGAIN;
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>>>>>         /* Parse the ID mapping tree to find specified node type */
>>>>>         while (node) {
>>>>>                 struct acpi_iort_id_mapping *map;
>>>>> -             int i, index;
>>>>> +             int i, index, rc = 0;
>>>>> +             u32 out_ref = 0, map_id = id;
>>>>>
>>>>>                 if (IORT_TYPE_MASK(node->type) & type_mask) {
>>>>>                         if (id_out)
>>>>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
>>>>>                         if (i == index)
>>>>>                                 continue;
>>>>>
>>>>> -                     if (!iort_id_map(map, node->type, id, &id))
>>>>> +                     rc = iort_id_map(map, node->type, map_id, &id);
>>>>> +                     if (!rc)
>>>>>                                 break;
>>>>
>>>> This needs a big FW_BUG splat in the case where it did find an overlap.
>>>
>>> Sure, although we did help create the problem in the first place.
>>>
>>>> Ideally we'd also enforce that the other half of must be the first entry
>>>> of another range, but perhaps we're into diminishing returns by that point.
>>>>
>>>
>>> That would mean the regions overlap regardless of whether you
>>> interpret num_ids correctly or not, which means we'll be doing
>>> validation of general well-formedness of the table rather than
>>> providing a workaround for this particular issue.
>>
>> The point was to limit any change in behaviour to the specific case that
>> we need to work around. Otherwise a table that was entirely malformed
>> rather than just off-by-one on the sizes might go from happening-to-work
>> to not working, or vice versa; the diminishing return is in how much we
>> care about that.
>>
> 
> I see. I think it is quite unlikely that a working system with
> overlapping ID ranges would work, and suddenly fail horribly when the
> exact point of overlap is shifted by 1. But yeah, I see your point.

Say that due to a copy-paste error or some other development artefact, 
the same correctly-sized input range is described twice, but the second 
copy has the wrong output base. Unless the IORT implementation is wacky 
enough to process mappings in reverse order it will have worked out OK, 
until suddenly the highest input ID starts falling through to the 
spurious broken mapping instead.

The match quirk implicitly encodes the exact nature of the ambiguity 
known to be present in the given table, so can be confident in fixing it 
up quietly. The heuristic doesn't have that luxury, so is wise to keep 
its scope as narrow as possible, and warn the user when it does choose 
to second-guess something on the off-chance that doing so actually makes 
the situation worse.

Robin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 14:13           ` Robin Murphy
@ 2020-05-01 14:26             ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01 14:26 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Lorenzo Pieralisi,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On Fri, 1 May 2020 at 16:13, Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-05-01 2:10 pm, Ard Biesheuvel wrote:
> > On Fri, 1 May 2020 at 14:31, Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> On 2020-05-01 12:41 pm, Ard Biesheuvel wrote:
> >>> On Fri, 1 May 2020 at 12:55, Robin Murphy <robin.murphy@arm.com> wrote:
> >>>>
> >>>> On 2020-05-01 10:58 am, Ard Biesheuvel wrote:
> >>>>> The ID mapping table structure of the IORT table describes the size of
> >>>>> a range using a num_ids field carrying the number of IDs in the region
> >>>>> minus one. This has been misinterpreted in the past in the parsing code,
> >>>>> and firmware is known to have shipped where this results in an ambiguity,
> >>>>> where regions that should be adjacent have an overlap of one value.
> >>>>>
> >>>>> So let's work around this by detecting this case specifically: when
> >>>>> resolving an ID translation, allow one that matches right at the end of
> >>>>> a multi-ID region to be superseded by a subsequent one.
> >>>>>
> >>>>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>>>> ---
> >>>>>     drivers/acpi/arm64/iort.c | 23 +++++++++++++++-----
> >>>>>     1 file changed, 18 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >>>>> index 98be18266a73..d826dd9dc4c5 100644
> >>>>> --- a/drivers/acpi/arm64/iort.c
> >>>>> +++ b/drivers/acpi/arm64/iort.c
> >>>>> @@ -316,10 +316,19 @@ static int iort_id_map(struct acpi_iort_id_mapping *map, u8 type, u32 rid_in,
> >>>>>         }
> >>>>>
> >>>>>         if (rid_in < map->input_base ||
> >>>>> -         (rid_in >= map->input_base + map->id_count))
> >>>>> +         (rid_in > map->input_base + map->id_count))
> >>>>>                 return -ENXIO;
> >>>>>
> >>>>>         *rid_out = map->output_base + (rid_in - map->input_base);
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Due to confusion regarding the meaning of the id_count field (which
> >>>>> +      * carries the number of IDs *minus 1*), we may have to disregard this
> >>>>> +      * match if it is at the end of the range, and overlaps with the start
> >>>>> +      * of another one.
> >>>>> +      */
> >>>>> +     if (map->id_count > 0 && rid_in == map->input_base + map->id_count)
> >>>>> +             return -EAGAIN;
> >>>>>         return 0;
> >>>>>     }
> >>>>>
> >>>>> @@ -404,7 +413,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >>>>>         /* Parse the ID mapping tree to find specified node type */
> >>>>>         while (node) {
> >>>>>                 struct acpi_iort_id_mapping *map;
> >>>>> -             int i, index;
> >>>>> +             int i, index, rc = 0;
> >>>>> +             u32 out_ref = 0, map_id = id;
> >>>>>
> >>>>>                 if (IORT_TYPE_MASK(node->type) & type_mask) {
> >>>>>                         if (id_out)
> >>>>> @@ -438,15 +448,18 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
> >>>>>                         if (i == index)
> >>>>>                                 continue;
> >>>>>
> >>>>> -                     if (!iort_id_map(map, node->type, id, &id))
> >>>>> +                     rc = iort_id_map(map, node->type, map_id, &id);
> >>>>> +                     if (!rc)
> >>>>>                                 break;
> >>>>
> >>>> This needs a big FW_BUG splat in the case where it did find an overlap.
> >>>
> >>> Sure, although we did help create the problem in the first place.
> >>>
> >>>> Ideally we'd also enforce that the other half of must be the first entry
> >>>> of another range, but perhaps we're into diminishing returns by that point.
> >>>>
> >>>
> >>> That would mean the regions overlap regardless of whether you
> >>> interpret num_ids correctly or not, which means we'll be doing
> >>> validation of general well-formedness of the table rather than
> >>> providing a workaround for this particular issue.
> >>
> >> The point was to limit any change in behaviour to the specific case that
> >> we need to work around. Otherwise a table that was entirely malformed
> >> rather than just off-by-one on the sizes might go from happening-to-work
> >> to not working, or vice versa; the diminishing return is in how much we
> >> care about that.
> >>
> >
> > I see. I think it is quite unlikely that a working system with
> > overlapping ID ranges would work, and suddenly fail horribly when the
> > exact point of overlap is shifted by 1. But yeah, I see your point.
>
> Say that due to a copy-paste error or some other development artefact,
> the same correctly-sized input range is described twice, but the second
> copy has the wrong output base. Unless the IORT implementation is wacky
> enough to process mappings in reverse order it will have worked out OK,
> until suddenly the highest input ID starts falling through to the
> spurious broken mapping instead.
>

OK, so there are other quite unlikely scenarios where this might break :-)

> The match quirk implicitly encodes the exact nature of the ambiguity
> known to be present in the given table, so can be confident in fixing it
> up quietly. The heuristic doesn't have that luxury, so is wise to keep
> its scope as narrow as possible, and warn the user when it does choose
> to second-guess something on the off-chance that doing so actually makes
> the situation worse.
>

Fair enough. I'll have a go at incorporating the FW_BUG and the double check.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 13:49           ` Lorenzo Pieralisi
@ 2020-05-01 14:35             ` Ard Biesheuvel
  2020-05-01 14:57               ` Robin Murphy
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2020-05-01 14:35 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Robin Murphy, Linux ARM, ACPI Devel Maling List, Hanjun Guo,
	Pankaj Bansal, Will Deacon, Sudeep Holla, Catalin Marinas

On Fri, 1 May 2020 at 15:50, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote:
>
> [...]
>
> > > >> If we silently fix things up, then people will continue to write broken
> > > >> tables without even realising, new OSes will have to implement the same
> > > >> mechanism because vendors will have little interest in changing things
> > > >> that have worked "correctly" with Linux for years, and we've effectively
> > > >> achieved a de-facto redefinition of the spec. Making our end of the
> > > >> interface robust is obviously desirable, but there still needs to be
> > > >> *some* incentive for the folks on the other end to get it right.
> > > >>
> > > >
> > > > Agreed. But at least we'll be able to detect it and flag it in the
> > > > general case, rather than having a special case for D05/06 only
> > > > (although I suppose splitting the output ranges like those platforms
> > > > do is rather unusual)
> > >
> > > Yup, in principle the fixed quirk list gives a nice reassuring sense of
> > > "we'll work around these early platforms and everyone from now on will
> > > get it right", but whether reality plays out that way is another matter
> > > entirely...
> >
> > The reason I am looking into this is that I think the fix should go to
> > stable, given that the current situation makes it impossible to write
> > firmware that works with older and newer kernels.
>
> Yes. If we do remove the quirk the sooner we do it the better to
> reduce the stable patches.
>
> > Lorenzo said he wouldn't mind taking the current version with ACPI OEM
> > ID matching back to -stable, but it's another quirk list to manage,
> > which I would prefer to avoid.
> >
> > But I don't care deeply either way, to be honest, as long as we can
> > get something backported so compliant firmware is not being penalized
> > anymore.
>
> Question: if we remove the iort_workaround_oem_info stuff but we *do*
> keep the existing apply_id_count_workaround flag and we set it by going
> through all the mappings at boot time and detect if any of these
> off-by-one conditions apply - would the resulting code be any simpler ?
>
> The global flag would apply (as it does now) to _all_ mappings but it is
> very likely that if the off-by-one firmware bug is present it applies to
> the IORT table as a whole rather than a single mapping entry.
>

This particular issue is based on a misinterpretation, so I agree that
it makes sense to have a global flag, as long as we only set it if the
mappings are fully consistent in every other respect, or we'll run the
risk of hitting issues like the one Robin describes, where things
happen to work, but will fail once we apply the heuristic. Such an
issue could exist on one end of the table, while we could spot the
off-by-one issue somewhere else.

Which brings us back to a point I made earlier: do we really want to
validate the table and ensure that it is fully internally consistent?
Or do we want to be robust in the face of a single known issue that we
helped create?

So in my opinion, just fixing it up when we run into it is fine. I can
add the extra sanity check to reduce the potential fallout for other
broken systems, but beyond that, I think we shouldn't do too much.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity
  2020-05-01 14:35             ` Ard Biesheuvel
@ 2020-05-01 14:57               ` Robin Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2020-05-01 14:57 UTC (permalink / raw)
  To: Ard Biesheuvel, Lorenzo Pieralisi
  Cc: Linux ARM, ACPI Devel Maling List, Hanjun Guo, Pankaj Bansal,
	Will Deacon, Sudeep Holla, Catalin Marinas

On 2020-05-01 3:35 pm, Ard Biesheuvel wrote:
> On Fri, 1 May 2020 at 15:50, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
>>
>> On Fri, May 01, 2020 at 03:10:59PM +0200, Ard Biesheuvel wrote:
>>
>> [...]
>>
>>>>>> If we silently fix things up, then people will continue to write broken
>>>>>> tables without even realising, new OSes will have to implement the same
>>>>>> mechanism because vendors will have little interest in changing things
>>>>>> that have worked "correctly" with Linux for years, and we've effectively
>>>>>> achieved a de-facto redefinition of the spec. Making our end of the
>>>>>> interface robust is obviously desirable, but there still needs to be
>>>>>> *some* incentive for the folks on the other end to get it right.
>>>>>>
>>>>>
>>>>> Agreed. But at least we'll be able to detect it and flag it in the
>>>>> general case, rather than having a special case for D05/06 only
>>>>> (although I suppose splitting the output ranges like those platforms
>>>>> do is rather unusual)
>>>>
>>>> Yup, in principle the fixed quirk list gives a nice reassuring sense of
>>>> "we'll work around these early platforms and everyone from now on will
>>>> get it right", but whether reality plays out that way is another matter
>>>> entirely...
>>>
>>> The reason I am looking into this is that I think the fix should go to
>>> stable, given that the current situation makes it impossible to write
>>> firmware that works with older and newer kernels.
>>
>> Yes. If we do remove the quirk the sooner we do it the better to
>> reduce the stable patches.
>>
>>> Lorenzo said he wouldn't mind taking the current version with ACPI OEM
>>> ID matching back to -stable, but it's another quirk list to manage,
>>> which I would prefer to avoid.
>>>
>>> But I don't care deeply either way, to be honest, as long as we can
>>> get something backported so compliant firmware is not being penalized
>>> anymore.
>>
>> Question: if we remove the iort_workaround_oem_info stuff but we *do*
>> keep the existing apply_id_count_workaround flag and we set it by going
>> through all the mappings at boot time and detect if any of these
>> off-by-one conditions apply - would the resulting code be any simpler ?
>>
>> The global flag would apply (as it does now) to _all_ mappings but it is
>> very likely that if the off-by-one firmware bug is present it applies to
>> the IORT table as a whole rather than a single mapping entry.
>>
> 
> This particular issue is based on a misinterpretation, so I agree that
> it makes sense to have a global flag, as long as we only set it if the
> mappings are fully consistent in every other respect, or we'll run the
> risk of hitting issues like the one Robin describes, where things
> happen to work, but will fail once we apply the heuristic. Such an
> issue could exist on one end of the table, while we could spot the
> off-by-one issue somewhere else.
> 
> Which brings us back to a point I made earlier: do we really want to
> validate the table and ensure that it is fully internally consistent?
> Or do we want to be robust in the face of a single known issue that we
> helped create?
> 
> So in my opinion, just fixing it up when we run into it is fine. I can
> add the extra sanity check to reduce the potential fallout for other
> broken systems, but beyond that, I think we shouldn't do too much.

Agreed - AFAICS the extra robustness I'm asking for should only amount 
to a handful more lines on top of the proposed patch (maybe a couple of 
positive return values for "by the way this came from the start/end of a 
mapping range" instead of -EAGAIN). I think a separate scanning pass is 
likely to add up to more complexity and similar-but-not-quite-reusable 
code than simply detecting and handling potential off-by-one edges in-line.

Robin.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-01 14:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-01  9:58 [PATCH RFC 0/2] ACPI/IORT: rework num_ids off-by-one quirk Ard Biesheuvel
2020-05-01  9:58 ` [PATCH RFC 1/2] Revert "ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map()" Ard Biesheuvel
2020-05-01  9:58 ` [PATCH RFC 2/2] ACPI/IORT: work around num_ids ambiguity Ard Biesheuvel
2020-05-01 10:55   ` Robin Murphy
2020-05-01 11:41     ` Ard Biesheuvel
2020-05-01 12:31       ` Robin Murphy
2020-05-01 13:10         ` Ard Biesheuvel
2020-05-01 13:49           ` Lorenzo Pieralisi
2020-05-01 14:35             ` Ard Biesheuvel
2020-05-01 14:57               ` Robin Murphy
2020-05-01 14:13           ` Robin Murphy
2020-05-01 14:26             ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).