linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: James Morse <james.morse@arm.com>, Borislav Petkov <bp@alien8.de>
Cc: "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"helgaas@kernel.org" <helgaas@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"zhangliguang@linux.alibaba.com" <zhangliguang@linux.alibaba.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	Linuxarm <linuxarm@huawei.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	tanxiaofei <tanxiaofei@huawei.com>,
	yangyicong <yangyicong@huawei.com>
Subject: RE: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor specific HW errors
Date: Tue, 21 Apr 2020 13:18:17 +0000	[thread overview]
Message-ID: <4d7bfedd175345a198d47e5fa0561ec1@huawei.com> (raw)
In-Reply-To: <c73bb18b-02ef-6c35-f4cf-1738c17a96e5@arm.com>

Hi James,

>-----Original Message-----
>From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-
>owner@vger.kernel.org] On Behalf Of James Morse
>Sent: 08 April 2020 11:03
>To: Borislav Petkov <bp@alien8.de>; Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-acpi@vger.kernel.org; linux-pci@vger.kernel.org; linux-
>kernel@vger.kernel.org; rjw@rjwysocki.net; helgaas@kernel.org;
>lenb@kernel.org; tony.luck@intel.com; gregkh@linuxfoundation.org;
>zhangliguang@linux.alibaba.com; tglx@linutronix.de; Linuxarm
><linuxarm@huawei.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; tanxiaofei <tanxiaofei@huawei.com>;
>yangyicong <yangyicong@huawei.com>
>Subject: Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Boris, Shiju,
>
>Sorry for not spotting this reply earlier: Its in-reply to v1, so gets buried.
I will resend the v7 patch solving this issue.
I guess the remaining  questions here are for Boris. May be can we discuss
your comments with V7 patch, which I will send?

>
>On 27/03/2020 18:22, Borislav Petkov wrote:
>> On Wed, Mar 25, 2020 at 04:42:22PM +0000, Shiju Jose wrote:
>>> Presently APEI does not support reporting the vendor specific HW
>>> errors, received in the vendor defined table entries, to the vendor
>>> drivers for any recovery.
>>>
>>> This patch adds the support to register and unregister the
>>
>> Avoid having "This patch" or "This commit" in the commit message. It
>> is tautologically useless.
>>
>> Also, do
>>
>> $ git grep 'This patch' Documentation/process
>>
>> for more details.
>>
>>> error handling function for the vendor specific HW errors and notify
>>> the registered kernel driver.
>
>>> @@ -526,10 +552,17 @@ static void ghes_do_proc(struct ghes *ghes,
>>>  			log_arm_hw_error(err);
>>>  		} else {
>>>  			void *err = acpi_hest_get_payload(gdata);
>>> +			u8 error_handled = false;
>>> +			int ret;
>>> +
>>> +			ret =
>atomic_notifier_call_chain(&ghes_event_notify_list, 0,
>>> +gdata);
>>
>> Well, this is a notifier with standard name for a non-standard event.
>> Not optimal.
>>
>> Why does only this event need a notifier? Because your driver is
>> interested in only those events?
>
>Its the 'else' catch-all for stuff drivers/acpi/apei  doesn't know to handle.
>
>In this case its because its a vendor specific GUID that only the vendor driver
>knows how to parse.
>
>
>>> +			if (ret & NOTIFY_OK)
>>> +				error_handled = true;
>>>
>>>  			log_non_standard_event(sec_type, fru_id, fru_text,
>>>  					       sec_sev, err,
>>> -					       gdata->error_data_length);
>>> +					       gdata->error_data_length,
>>> +					       error_handled);
>>
>> What's that error_handled thing for? That's just silly.
>>
>> Your notifier returns NOTIFY_STOP when it has queued the error. If you
>> don't want to log it, just test == NOTIFY_STOP and do not log it then.
>
>My thinking for this being needed was so user-space consumers of those
>tracepoints keep working. Otherwise you upgrade, get this feature, and your
>user-space counters stop working.
>
>You'd need to know this error source was now managed by an in-kernel
>driver, which may report the errors somewhere else...
>
>
>> Then your notifier callback is queuing the error into a kfifo for
>> whatever reason and then scheduling a workqueue to handle it in user
>> context...
>>
>> So I'm thinking that it would be better if you:
>>
>> * make that kfifo generic and part of ghes.c and queue all types of
>> error records into it in ghes_do_proc() - not just the non-standard
>> ones.
>
>Move the drop to process context into ghes.c? This should result in less code.
>
>I asked for this hooking to only be for the 'catch all' don't-know case so that
>we don't get drivers trying to hook and handle memory errors. (if we ever
>wanted that, it should be from part of memory_failure() so it catches all the
>ways of reporting memory-failure) 32bit arm has prior in this area.
>
>
>> * then, when you're done queuing, you kick a workqueue.
>>
>> * that workqueue runs a normal, blocking notifier to which drivers
>> register.
>>
>> Your driver can register to that notifier too and do the normal
>> handling then and not have this ad-hoc, semi-generic, semi-vendor-specific
>thing.
>
>As long as we don't walk a list of things that might handle a memory-error,
>and have some random driver try and NOTIFY_STOP it....
>
>aer_recover_queue() would be replaced by this. memory_failure_queue() has
>one additional caller in drivers/ras/cec.c.
>
>
>Thanks,
>
>James
Thanks,
Shiju

  reply	other threads:[~2020-04-21 13:18 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Shiju Jose>
2020-01-15 11:01 ` [RFC PATCH 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2020-01-15 11:01   ` [RFC PATCH 1/2] " Shiju Jose
2020-01-15 11:01   ` [RFC PATCH 2/2] PCI:hip08:Add driver to handle HiSilicon hip08 PCIe controller's errors Shiju Jose
2020-01-15 14:13     ` Bjorn Helgaas
2020-01-17  9:40       ` Shiju Jose
2020-01-24 12:39 ` [PATCH v2 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2020-01-24 12:39   ` [PATCH v2 1/2] " Shiju Jose
2020-01-24 12:39   ` [PATCH v2 2/2] PCI: hip: Add handling of HiSilicon hip PCIe controller's errors Shiju Jose
2020-01-24 14:30     ` Bjorn Helgaas
2020-01-26 18:12     ` kbuild test robot
2020-01-26 18:12     ` [RFC PATCH] PCI: hip: hisi_pcie_sec_type can be static kbuild test robot
2020-02-03 16:51 ` [PATCH v3 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2020-02-03 16:51   ` [PATCH v3 1/2] " Shiju Jose
2020-02-03 16:51   ` [PATCH v3 2/2] PCI: HIP: Add handling of HiSilicon HIP PCIe controller's errors Shiju Jose
2020-02-04 14:31     ` Dan Carpenter
2020-02-07 10:31 ` [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2020-02-07 10:31   ` [PATCH v4 1/2] " Shiju Jose
2020-03-11 17:29     ` James Morse
2020-03-12 12:10       ` Shiju Jose
2020-03-13 15:17         ` James Morse
2020-03-13 17:08           ` Shiju Jose
2020-02-07 10:31   ` [PATCH v4 2/2] PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors Shiju Jose
2020-03-09  9:23   ` [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors Shiju Jose
2020-03-11 17:27     ` James Morse
2020-03-25 16:42 ` [PATCH v6 0/2] ACPI / " Shiju Jose
2020-03-25 16:42   ` [PATCH v6 1/2] " Shiju Jose
2020-03-27 18:22     ` Borislav Petkov
2020-03-30 10:14       ` Shiju Jose
2020-03-30 10:33         ` Borislav Petkov
2020-03-30 11:55           ` Shiju Jose
2020-03-30 13:42             ` Borislav Petkov
2020-03-30 15:44               ` Shiju Jose
2020-03-31  9:09                 ` Borislav Petkov
2020-04-08  9:20                   ` Shiju Jose
2020-04-08 10:03       ` James Morse
2020-04-21 13:18         ` Shiju Jose [this message]
2020-05-11 11:20         ` Shiju Jose
2020-03-25 16:42   ` [PATCH v6 2/2] PCI: hip: Add handling of HiSilicon HIP PCIe controller errors Shiju Jose
2020-03-27 15:07   ` [PATCH v6 0/2] ACPI / APEI: Add support to notify the vendor specific HW errors Bjorn Helgaas
2020-04-07 12:00 ` [v7 PATCH 0/6] ACPI / APEI: Add support to notify non-fatal " Shiju Jose
2020-04-07 12:00   ` [v7 PATCH 1/6] ACPI / APEI: Add support to queuing up the non-fatal HW errors and notify Shiju Jose
2020-04-08 19:41     ` kbuild test robot
2020-04-08 19:41     ` [RFC PATCH] ACPI / APEI: ghes_gdata_pool_init() can be static kbuild test robot
2020-04-07 12:00   ` [v7 PATCH 2/6] ACPI / APEI: Add callback for memory errors to the GHES notifier Shiju Jose
2020-04-07 12:00   ` [v7 PATCH 3/6] ACPI / APEI: Add callback for AER " Shiju Jose
2020-04-07 12:00   ` [v7 PATCH 4/6] ACPI / APEI: Add callback for ARM HW errors " Shiju Jose
2020-04-07 12:00   ` [v7 PATCH 5/6] ACPI / APEI: Add callback for non-standard " Shiju Jose
2020-04-07 12:00   ` [v7 PATCH 6/6] PCI: hip: Add handling of HiSilicon HIP PCIe controller errors Shiju Jose
2020-04-21 13:21 ` [RESEND PATCH v7 0/6] ACPI / APEI: Add support to notify non-fatal HW errors Shiju Jose
2020-04-21 13:21   ` [RESEND PATCH v7 1/6] ACPI / APEI: Add support to queuing up the non-fatal HW errors and notify Shiju Jose
2020-04-21 14:12     ` Dan Carpenter
2020-04-21 13:21   ` [RESEND PATCH v7 2/6] ACPI / APEI: Add callback for memory errors to the GHES notifier Shiju Jose
2020-04-21 13:21   ` [RESEND PATCH v7 3/6] ACPI / APEI: Add callback for AER " Shiju Jose
2020-04-21 13:21   ` [RESEND PATCH v7 4/6] ACPI / APEI: Add callback for ARM HW errors " Shiju Jose
2020-04-21 14:14     ` Dan Carpenter
2020-04-21 15:18       ` Shiju Jose
2020-04-21 13:21   ` [RESEND PATCH v7 5/6] ACPI / APEI: Add callback for non-standard " Shiju Jose
2020-04-21 13:21   ` [RESEND PATCH v7 6/6] PCI: hip: Add handling of HiSilicon HIP PCIe controller errors Shiju Jose
2020-04-21 14:20     ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d7bfedd175345a198d47e5fa0561ec1@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=rjw@rjwysocki.net \
    --cc=tanxiaofei@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=yangyicong@huawei.com \
    --cc=zhangliguang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).