linux-cxl.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types
@ 2023-04-03 15:18 Ben Cheatham
  2023-05-31 21:31 ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Cheatham @ 2023-04-03 15:18 UTC (permalink / raw)
  To: rafael, dan.j.williams, ira.weiny, alison.schofield, vishal.l.verma
  Cc: yazen.ghannam, benjamin.cheatham, linux-cxl, linux-acpi

From: Yazen Ghannam <yazen.ghannam@amd.com>

V2 Changes:
 - Added Link tags for links
 - removed stray unused variable

This patch is a follow up to the discussion at [1], and builds on Tony's
CXL error patch at [2].

The new CXL error types will use the Memory Address field in the
SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
compliant memory-mapped Downstream port. The value of the Memory Address
will be in the port's MMIO range, and it will not represent physical
(normal or persistent) memory.

Allow error injection for CXL 1.1 systems by skipping memory range
validation for CXL error injection types.

Output trying to inject CXL.mem error without patch:

-bash: echo: write error: Invalid argument

[1]:
Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
[2]:
Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
 drivers/acpi/apei/einj.c | 12 +++++++++++-
 include/acpi/actbl1.h    |  6 ++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
index 013eb621dc92..68a20326ed7c 100644
--- a/drivers/acpi/apei/einj.c
+++ b/drivers/acpi/apei/einj.c
@@ -37,6 +37,13 @@
 				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
 				ACPI_EINJ_MEMORY_FATAL)
 
+#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
+				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
+				ACPI_EINJ_CXL_CACHE_FATAL		| \
+				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
+				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
+				ACPI_EINJ_CXL_MEM_FATAL)
+
 /*
  * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
  */
@@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
 	if (type & ACPI5_VENDOR_BIT) {
 		if (vendor_flags != SETWA_FLAGS_MEM)
 			goto inject;
-	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
+	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
+		goto inject;
+	} else if (type & CXL_ERROR_MASK) {
 		goto inject;
+	}
 
 	/*
 	 * Disallow crazy address masks that give BIOS leeway to pick
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 81b9e794424d..c39837266414 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -1044,6 +1044,12 @@ enum acpi_einj_command_status {
 #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
 #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
 #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
+#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
+#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
+#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
+#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
+#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
+#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)
 #define ACPI_EINJ_VENDOR_DEFINED            (1<<31)
 
 /*******************************************************************************
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types
  2023-04-03 15:18 [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types Ben Cheatham
@ 2023-05-31 21:31 ` Dan Williams
  2023-06-01  3:57   ` Dan Williams
  2023-06-01 14:45   ` Ben Cheatham
  0 siblings, 2 replies; 5+ messages in thread
From: Dan Williams @ 2023-05-31 21:31 UTC (permalink / raw)
  To: Ben Cheatham, rafael, dan.j.williams, ira.weiny,
	alison.schofield, vishal.l.verma
  Cc: yazen.ghannam, benjamin.cheatham, linux-cxl, linux-acpi

Hi Ben,

Ben Cheatham wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> V2 Changes:
>  - Added Link tags for links
>  - removed stray unused variable
> 
> This patch is a follow up to the discussion at [1], and builds on Tony's
> CXL error patch at [2].
> 
> The new CXL error types will use the Memory Address field in the
> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> compliant memory-mapped Downstream port. The value of the Memory Address
> will be in the port's MMIO range, and it will not represent physical
> (normal or persistent) memory.
> 
> Allow error injection for CXL 1.1 systems by skipping memory range
> validation for CXL error injection types.

This just feels a bit too loose especially when the kernel has
the cxl_acpi driver to perform the enumeration of CXL root ports.

I know that Terry and Robert are teaching the PCI AER core how to
coordinate with RCRB information [1] (I still need to go dig in deeper
on that set). I would expect ACPI EINJ could benefit from similar
coordination and validate these addresses.

Now, is it any address in the downstream-port RCRB range that is valid,
or only the base?

Another minor comment below...

[1]: http://lore.kernel.org/r/20230523232214.55282-1-terry.bowman@amd.com

> 
> Output trying to inject CXL.mem error without patch:
> 
> -bash: echo: write error: Invalid argument
> 
> [1]:
> Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
> [2]:
> Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
>  drivers/acpi/apei/einj.c | 12 +++++++++++-
>  include/acpi/actbl1.h    |  6 ++++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index 013eb621dc92..68a20326ed7c 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -37,6 +37,13 @@
>  				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>  				ACPI_EINJ_MEMORY_FATAL)
>  
> +#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
> +				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
> +				ACPI_EINJ_CXL_CACHE_FATAL		| \
> +				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
> +				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
> +				ACPI_EINJ_CXL_MEM_FATAL)
> +
>  /*
>   * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>   */
> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>  	if (type & ACPI5_VENDOR_BIT) {
>  		if (vendor_flags != SETWA_FLAGS_MEM)
>  			goto inject;
> -	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
> +	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
> +		goto inject;
> +	} else if (type & CXL_ERROR_MASK) {
>  		goto inject;
> +	}
>  
>  	/*
>  	 * Disallow crazy address masks that give BIOS leeway to pick
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 81b9e794424d..c39837266414 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -1044,6 +1044,12 @@ enum acpi_einj_command_status {
>  #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
>  #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
>  #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
> +#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
> +#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)

I expect these to come from the next ACPICA update just like the other
definitions. The change in style from (x<<N) to BIT(N) was a tip-off.
The process is to submit a pull request to the ACPICA project, for
example:

https://github.com/acpica/acpica/commit/e948142526c0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types
  2023-05-31 21:31 ` Dan Williams
@ 2023-06-01  3:57   ` Dan Williams
  2023-06-01 14:45     ` Ben Cheatham
  2023-06-01 14:45   ` Ben Cheatham
  1 sibling, 1 reply; 5+ messages in thread
From: Dan Williams @ 2023-06-01  3:57 UTC (permalink / raw)
  To: Dan Williams, Ben Cheatham, rafael, ira.weiny, alison.schofield,
	vishal.l.verma
  Cc: yazen.ghannam, benjamin.cheatham, linux-cxl, linux-acpi

Dan Williams wrote:
> Hi Ben,
> 
> Ben Cheatham wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> > 
> > V2 Changes:
> >  - Added Link tags for links
> >  - removed stray unused variable
> > 
> > This patch is a follow up to the discussion at [1], and builds on Tony's
> > CXL error patch at [2].
> > 
> > The new CXL error types will use the Memory Address field in the
> > SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
> > compliant memory-mapped Downstream port. The value of the Memory Address
> > will be in the port's MMIO range, and it will not represent physical
> > (normal or persistent) memory.
> > 
> > Allow error injection for CXL 1.1 systems by skipping memory range
> > validation for CXL error injection types.
> 
> This just feels a bit too loose especially when the kernel has
> the cxl_acpi driver to perform the enumeration of CXL root ports.
> 
> I know that Terry and Robert are teaching the PCI AER core how to
> coordinate with RCRB information [1] (I still need to go dig in deeper
> on that set). I would expect ACPI EINJ could benefit from similar
> coordination and validate these addresses.
> 
> Now, is it any address in the downstream-port RCRB range that is valid,
> or only the base?

Now that I say the above, I wonder if the legacy EINJ interface is even
the right interface for CXL protocol error injection. It feels like
something that should be relative to the given 'struct cxl_port'
instance where the error is to be injected. Userspace should not need to
deal in MMIO addresses, especially in the RCH case where the RCRB is
difficult for userspace to enumerate.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types
  2023-05-31 21:31 ` Dan Williams
  2023-06-01  3:57   ` Dan Williams
@ 2023-06-01 14:45   ` Ben Cheatham
  1 sibling, 0 replies; 5+ messages in thread
From: Ben Cheatham @ 2023-06-01 14:45 UTC (permalink / raw)
  To: Dan Williams, rafael, ira.weiny, alison.schofield, vishal.l.verma
  Cc: yazen.ghannam, linux-cxl, linux-acpi

Thanks for the review Dan, responses inline.

On 5/31/23 4:31 PM, Dan Williams wrote:
> Hi Ben,
> 
> Ben Cheatham wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> V2 Changes:
>>  - Added Link tags for links
>>  - removed stray unused variable
>>
>> This patch is a follow up to the discussion at [1], and builds on Tony's
>> CXL error patch at [2].
>>
>> The new CXL error types will use the Memory Address field in the
>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>> compliant memory-mapped Downstream port. The value of the Memory Address
>> will be in the port's MMIO range, and it will not represent physical
>> (normal or persistent) memory.
>>
>> Allow error injection for CXL 1.1 systems by skipping memory range
>> validation for CXL error injection types.
> 
> This just feels a bit too loose especially when the kernel has
> the cxl_acpi driver to perform the enumeration of CXL root ports.
> 
> I know that Terry and Robert are teaching the PCI AER core how to
> coordinate with RCRB information [1] (I still need to go dig in deeper
> on that set). I would expect ACPI EINJ could benefit from similar
> coordination and validate these addresses.
>> Now, is it any address in the downstream-port RCRB range that is valid,
> or only the base?
> 
Response to above in your follow up email.

> Another minor comment below...
> 
> [1]: http://lore.kernel.org/r/20230523232214.55282-1-terry.bowman@amd.com
> 
>>
>> Output trying to inject CXL.mem error without patch:
>>
>> -bash: echo: write error: Invalid argument
>>
>> [1]:
>> Link: https://lore.kernel.org/linux-acpi/20221206205234.606073-1-Benjamin.Cheatham@amd.com/
>> [2]:
>> Link: https://lore.kernel.org/linux-cxl/CAJZ5v0hNQUfWViqxbJ5B4JCGJUuHpWWSpqpCFWPNpGuagoFbsQ@mail.gmail.com/T/#t
>>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Signed-off-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> ---
>>  drivers/acpi/apei/einj.c | 12 +++++++++++-
>>  include/acpi/actbl1.h    |  6 ++++++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
>> index 013eb621dc92..68a20326ed7c 100644
>> --- a/drivers/acpi/apei/einj.c
>> +++ b/drivers/acpi/apei/einj.c
>> @@ -37,6 +37,13 @@
>>  				ACPI_EINJ_MEMORY_UNCORRECTABLE | \
>>  				ACPI_EINJ_MEMORY_FATAL)
>>  
>> +#define CXL_ERROR_MASK		(ACPI_EINJ_CXL_CACHE_CORRECTABLE	| \
>> +				ACPI_EINJ_CXL_CACHE_UNCORRECTABLE	| \
>> +				ACPI_EINJ_CXL_CACHE_FATAL		| \
>> +				ACPI_EINJ_CXL_MEM_CORRECTABLE		| \
>> +				ACPI_EINJ_CXL_MEM_UNCORRECTABLE		| \
>> +				ACPI_EINJ_CXL_MEM_FATAL)
>> +
>>  /*
>>   * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
>>   */
>> @@ -537,8 +544,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
>>  	if (type & ACPI5_VENDOR_BIT) {
>>  		if (vendor_flags != SETWA_FLAGS_MEM)
>>  			goto inject;
>> -	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
>> +	} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
>> +		goto inject;
>> +	} else if (type & CXL_ERROR_MASK) {
>>  		goto inject;
>> +	}
>>  
>>  	/*
>>  	 * Disallow crazy address masks that give BIOS leeway to pick
>> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
>> index 81b9e794424d..c39837266414 100644
>> --- a/include/acpi/actbl1.h
>> +++ b/include/acpi/actbl1.h
>> @@ -1044,6 +1044,12 @@ enum acpi_einj_command_status {
>>  #define ACPI_EINJ_PLATFORM_CORRECTABLE      (1<<9)
>>  #define ACPI_EINJ_PLATFORM_UNCORRECTABLE    (1<<10)
>>  #define ACPI_EINJ_PLATFORM_FATAL            (1<<11)
>> +#define ACPI_EINJ_CXL_CACHE_CORRECTABLE     BIT(12)
>> +#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE   BIT(13)
>> +#define ACPI_EINJ_CXL_CACHE_FATAL           BIT(14)
>> +#define ACPI_EINJ_CXL_MEM_CORRECTABLE       BIT(15)
>> +#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE     BIT(16)
>> +#define ACPI_EINJ_CXL_MEM_FATAL             BIT(17)
> 
> I expect these to come from the next ACPICA update just like the other
> definitions. The change in style from (x<<N) to BIT(N) was a tip-off.
> The process is to submit a pull request to the ACPICA project, for
> example:
> 
> https://github.com/acpica/acpica/commit/e948142526c0

I wasn't aware of this, I'll go ahead and do that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types
  2023-06-01  3:57   ` Dan Williams
@ 2023-06-01 14:45     ` Ben Cheatham
  0 siblings, 0 replies; 5+ messages in thread
From: Ben Cheatham @ 2023-06-01 14:45 UTC (permalink / raw)
  To: Dan Williams, rafael, ira.weiny, alison.schofield, vishal.l.verma
  Cc: yazen.ghannam, linux-cxl, linux-acpi

On 5/31/23 10:57 PM, Dan Williams wrote:
> Dan Williams wrote:
>> Hi Ben,
>>
>> Ben Cheatham wrote:
>>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>>
>>> V2 Changes:
>>>  - Added Link tags for links
>>>  - removed stray unused variable
>>>
>>> This patch is a follow up to the discussion at [1], and builds on Tony's
>>> CXL error patch at [2].
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped Downstream port. The value of the Memory Address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Allow error injection for CXL 1.1 systems by skipping memory range
>>> validation for CXL error injection types.
>>
>> This just feels a bit too loose especially when the kernel has
>> the cxl_acpi driver to perform the enumeration of CXL root ports.
>>
>> I know that Terry and Robert are teaching the PCI AER core how to
>> coordinate with RCRB information [1] (I still need to go dig in deeper
>> on that set). I would expect ACPI EINJ could benefit from similar
>> coordination and validate these addresses.
>>
I haven't kept up with that patch set either, I'll take a look.

>> Now, is it any address in the downstream-port RCRB range that is valid,
>> or only the base?
I took another look at the ACPI 6.5 spec since it's been a while and it doesn't specify,
but when I tested this patch I think I only used the base address. I'll try out other
addresses in the range and see if they work as well.

> 
> Now that I say the above, I wonder if the legacy EINJ interface is even
> the right interface for CXL protocol error injection. It feels like
> something that should be relative to the given 'struct cxl_port'
> instance where the error is to be injected. Userspace should not need to
> deal in MMIO addresses, especially in the RCH case where the RCRB is
> difficult for userspace to enumerate.
I think that's a good idea. My first idea is to have sysfs files under the port that 
are basically a shortcut for an EINJ injection through the legacy interface. I think
it might be possible to have just the sysfs files and no legacy interface support,
but since I'll most likely be reusing the EINJ code I imagine the legacy interface 
will have to be updated anyway.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-01 14:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-03 15:18 [PATCH v2] ACPI, APEI, EINJ: Remove memory range validation for CXL error types Ben Cheatham
2023-05-31 21:31 ` Dan Williams
2023-06-01  3:57   ` Dan Williams
2023-06-01 14:45     ` Ben Cheatham
2023-06-01 14:45   ` Ben Cheatham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).