All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shiyang Ruan <ruansy.fnst@fujitsu.com>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: qemu-devel@nongnu.org, linux-cxl@vger.kernel.org,
	dan.j.williams@intel.com
Subject: Re: [RFC PATCH 5/5] cxl/core: add poison injection event handler
Date: Fri, 15 Mar 2024 10:29:07 +0800	[thread overview]
Message-ID: <48223415-8466-480d-86e1-8b9945782c0c@fujitsu.com> (raw)
In-Reply-To: <20240213165150.00006d9a@Huawei.com>



在 2024/2/14 0:51, Jonathan Cameron 写道:
> 
>> +
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> +			     enum cxl_event_log_type type,
>> +			     enum cxl_event_type event_type,
>> +			     const uuid_t *uuid, union cxl_event *evt)
>> +{
>> +	if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
>>   		trace_cxl_general_media(cxlmd, type, &evt->gen_media);
>> -	else if (event_type == CXL_CPER_EVENT_DRAM)
>> +		/* handle poison event */
>> +		if (type == CXL_EVENT_TYPE_FAIL)
>> +			cxl_event_handle_poison(cxlmd, &evt->gen_media);
> 
> I'm not 100% convinced this is necessary poison causing.  Also
> the text tells us we should see 'an appropriate event'.
> DRAM one seems likely to be chosen by some vendors.

I think it's right to use DRAM Event Record for volatile-memdev, but 
should poison on a persistent-memdev also use DRAM Event Record too? 
Though its 'Physical Address' feild has the 'Volatile' bit too, which is 
same as General Media Event Record.  I am a bit confused about this.

> 
> The fatal check maybe makes it a little more likely (maybe though
> I'm not sure anything says a device must log it to the failure log)
> but it might be Memory Event Type 1, which is the host tried to
> access an invalid address.  Sure poison might be returned to that
> error but what would the main kernel memory handling do with it?
> Something is very wrong
> but it's not corrupted device memory.  TE state violations are in there
> as well. Sure poison is returned on reads (I think - haven't checked).
> 
> IF the aim here is to say 'maybe there is poison, better check the
> poison list'. Then that is reasonable but we should ensure things
> like timer expiry are definitely ruled out and rename the function
> to make it clear it might not find poison.

I forgot to distinguish the 'Transaction Type' here. Host Inject Poison 
is 0x04h. And other types should also have their specific handle method.


--
Thanks,
Ruan.

> 
> Jonathan

WARNING: multiple messages have this Message-ID (diff)
From: Shiyang Ruan via <qemu-devel@nongnu.org>
To: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Cc: qemu-devel@nongnu.org, linux-cxl@vger.kernel.org,
	dan.j.williams@intel.com
Subject: Re: [RFC PATCH 5/5] cxl/core: add poison injection event handler
Date: Fri, 15 Mar 2024 10:29:07 +0800	[thread overview]
Message-ID: <48223415-8466-480d-86e1-8b9945782c0c@fujitsu.com> (raw)
In-Reply-To: <20240213165150.00006d9a@Huawei.com>



在 2024/2/14 0:51, Jonathan Cameron 写道:
> 
>> +
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> +			     enum cxl_event_log_type type,
>> +			     enum cxl_event_type event_type,
>> +			     const uuid_t *uuid, union cxl_event *evt)
>> +{
>> +	if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
>>   		trace_cxl_general_media(cxlmd, type, &evt->gen_media);
>> -	else if (event_type == CXL_CPER_EVENT_DRAM)
>> +		/* handle poison event */
>> +		if (type == CXL_EVENT_TYPE_FAIL)
>> +			cxl_event_handle_poison(cxlmd, &evt->gen_media);
> 
> I'm not 100% convinced this is necessary poison causing.  Also
> the text tells us we should see 'an appropriate event'.
> DRAM one seems likely to be chosen by some vendors.

I think it's right to use DRAM Event Record for volatile-memdev, but 
should poison on a persistent-memdev also use DRAM Event Record too? 
Though its 'Physical Address' feild has the 'Volatile' bit too, which is 
same as General Media Event Record.  I am a bit confused about this.

> 
> The fatal check maybe makes it a little more likely (maybe though
> I'm not sure anything says a device must log it to the failure log)
> but it might be Memory Event Type 1, which is the host tried to
> access an invalid address.  Sure poison might be returned to that
> error but what would the main kernel memory handling do with it?
> Something is very wrong
> but it's not corrupted device memory.  TE state violations are in there
> as well. Sure poison is returned on reads (I think - haven't checked).
> 
> IF the aim here is to say 'maybe there is poison, better check the
> poison list'. Then that is reasonable but we should ensure things
> like timer expiry are definitely ruled out and rename the function
> to make it clear it might not find poison.

I forgot to distinguish the 'Transaction Type' here. Host Inject Poison 
is 0x04h. And other types should also have their specific handle method.


--
Thanks,
Ruan.

> 
> Jonathan


  reply	other threads:[~2024-03-15  2:29 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09 11:54 [RFC PATCH SET] cxl: add poison event handler Shiyang Ruan
2024-02-09 11:54 ` Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 1/2] hw/cxl/type3: add missing flag bit for GMER Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-13 16:27   ` Jonathan Cameron
2024-02-13 16:27     ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 2/2] hw/cxl/type3: send a GMER while injecting poison Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-13 16:32   ` Jonathan Cameron
2024-02-13 16:32     ` Jonathan Cameron via
2024-02-09 11:54 ` [RFC PATCH 1/5] cxl/core: correct length of DPA field masks Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-10  6:34   ` Dan Williams
2024-02-19 10:49     ` Shiyang Ruan via
2024-02-19 10:49       ` Shiyang Ruan
2024-02-22  2:27       ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 2/5] cxl/core: introduce cxl_memdev_dpa_to_hpa() Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-10  6:39   ` Dan Williams
2024-02-09 11:54 ` [RFC PATCH 3/5] cxl/core: introduce cxl_mem_report_poison() Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-10  6:46   ` Dan Williams
2024-03-14 15:23     ` Shiyang Ruan
2024-03-14 15:23       ` Shiyang Ruan via
2024-02-15  1:19   ` Tony Luck
2024-02-09 11:54 ` [RFC PATCH 4/5] cxl/core: add report option for cxl_mem_get_poison() Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-10  6:49   ` Dan Williams
2024-03-14 15:01     ` Shiyang Ruan
2024-03-14 15:01       ` Shiyang Ruan via
2024-02-09 11:54 ` [RFC PATCH 5/5] cxl/core: add poison injection event handler Shiyang Ruan
2024-02-09 11:54   ` Shiyang Ruan via
2024-02-10  6:54   ` Dan Williams
2024-02-13 16:51   ` Jonathan Cameron
2024-02-13 16:51     ` Jonathan Cameron via
2024-03-15  2:29     ` Shiyang Ruan [this message]
2024-03-15  2:29       ` Shiyang Ruan via
2024-04-05 17:35       ` Jonathan Cameron
2024-04-05 17:35         ` Jonathan Cameron via
2024-02-13  0:20 ` [RFC PATCH SET] cxl: add poison " Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48223415-8466-480d-86e1-8b9945782c0c@fujitsu.com \
    --to=ruansy.fnst@fujitsu.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.