All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Wang, Qingshun" <qingshun.wang@linux.intel.com>
Cc: linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-acpi@vger.kernel.org, chao.p.peng@linux.intel.com,
	erwin.tsaur@intel.com, feiting.wanyan@intel.com,
	qingshun.wang@intel.com, "Rafael J. Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>, James Morse <james.morse@arm.com>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	Adam Preble <adam.c.preble@intel.com>,
	Li Yang <leoyang.li@nxp.com>, Lukas Wunner <lukas@wunner.de>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	Robert Richter <rrichter@amd.com>,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	linux-edac@vger.kernel.org
Subject: Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info
Date: Tue, 6 Feb 2024 11:23:35 -0600	[thread overview]
Message-ID: <20240206172335.GA872811@bhelgaas> (raw)
In-Reply-To: <2rfnevhnhylik4r6smr56uunsxweo7s5elo65sjhiztvxnr6bq@5fcyv22zxyyp>

On Wed, Feb 07, 2024 at 12:41:41AM +0800, Wang, Qingshun wrote:
> On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> > > When Advisory Non-Fatal errors are raised, both correctable and
> > > uncorrectable error statuses will be set. The current kernel code cannot
> > > store both statuses at the same time, thus failing to handle ANFE properly.
> > > In addition, to avoid clearing UEs that are not ANFE by accident, UE
> > > severity and Device Status also need to be recorded: any fatal UE cannot
> > > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> > > not take any assumption and let UE handler to clear UE status.
> > > 
> > > Store status and mask of both correctable and uncorrectable errors in
> > > aer_err_info. The severity of UEs and the values of the Device Status
> > > register are also recorded, which will be used to determine UEs that should
> > > be handled by the ANFE handler. Refactor the rest of the code to use
> > > cor/uncor_status and cor/uncor_mask fields instead of status and mask
> > > fields.
> > 
> > There's a lot going on in this patch.  Could it possibly be split up a
> > bit, e.g., first tease apart aer_err_info.status/.mask into
> > .cor_status/mask and .uncor_status/mask, then add .uncor_severity,
> > then add the device_status bit separately?  If it could be split up, I
> > think the ANFE case would be easier to see.
> 
> Thanks for the feedback! Will split it up into two pacthes in the next
> version.

Or even three:

  1) tease apart aer_err_info.status/.mask into .cor_status/mask and
     .uncor_status/mask

  2) add .uncor_severity

  3) add device_status

Looking at this again, I'm a little confused about 2) and 3).  I see
the new read of PCI_ERR_UNCOR_SEVER into .uncor_severity, but there's
no actual *use* of it.

Same for 3), I see the new read of PCI_EXP_DEVSTA, but AFAICS there's
no use of that value.

We should have the addition of these new values in the same patch
that *uses* them.

Bjorn

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Wang, Qingshun" <qingshun.wang@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
	Alison Schofield <alison.schofield@intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	linux-pci@vger.kernel.org, erwin.tsaur@intel.com,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org,
	Oliver O'Halloran <oohall@gmail.com>,
	chao.p.peng@linux.intel.com, Ira Weiny <ira.weiny@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dave Jiang <dave.jiang@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	linux-acpi@vger.kernel.org, Len Brown <lenb@kernel.org>,
	Robert Richter <rrichter@amd.com>, Borislav Petkov <bp@alien8.de>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-edac@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
	feiting.wanyan@intel.com, qingshun.wang@intel.com,
	Adam Preble <adam.c.preble@intel.com>,
	Mahesh J Salgaonk ar <mahesh@linux.ibm.com>,
	Li Yang <leoyang.li@nxp.com>, Lukas Wunner <lukas@wunner.de>,
	James Morse <james.morse@arm.com>,
	linuxppc-dev@lists.ozlabs.org, Shiju Jose <shiju.jose@huawei.com>
Subject: Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info
Date: Tue, 6 Feb 2024 11:23:35 -0600	[thread overview]
Message-ID: <20240206172335.GA872811@bhelgaas> (raw)
In-Reply-To: <2rfnevhnhylik4r6smr56uunsxweo7s5elo65sjhiztvxnr6bq@5fcyv22zxyyp>

On Wed, Feb 07, 2024 at 12:41:41AM +0800, Wang, Qingshun wrote:
> On Mon, Feb 05, 2024 at 05:12:31PM -0600, Bjorn Helgaas wrote:
> > On Thu, Jan 25, 2024 at 02:27:59PM +0800, Wang, Qingshun wrote:
> > > When Advisory Non-Fatal errors are raised, both correctable and
> > > uncorrectable error statuses will be set. The current kernel code cannot
> > > store both statuses at the same time, thus failing to handle ANFE properly.
> > > In addition, to avoid clearing UEs that are not ANFE by accident, UE
> > > severity and Device Status also need to be recorded: any fatal UE cannot
> > > be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
> > > not take any assumption and let UE handler to clear UE status.
> > > 
> > > Store status and mask of both correctable and uncorrectable errors in
> > > aer_err_info. The severity of UEs and the values of the Device Status
> > > register are also recorded, which will be used to determine UEs that should
> > > be handled by the ANFE handler. Refactor the rest of the code to use
> > > cor/uncor_status and cor/uncor_mask fields instead of status and mask
> > > fields.
> > 
> > There's a lot going on in this patch.  Could it possibly be split up a
> > bit, e.g., first tease apart aer_err_info.status/.mask into
> > .cor_status/mask and .uncor_status/mask, then add .uncor_severity,
> > then add the device_status bit separately?  If it could be split up, I
> > think the ANFE case would be easier to see.
> 
> Thanks for the feedback! Will split it up into two pacthes in the next
> version.

Or even three:

  1) tease apart aer_err_info.status/.mask into .cor_status/mask and
     .uncor_status/mask

  2) add .uncor_severity

  3) add device_status

Looking at this again, I'm a little confused about 2) and 3).  I see
the new read of PCI_ERR_UNCOR_SEVER into .uncor_severity, but there's
no actual *use* of it.

Same for 3), I see the new read of PCI_EXP_DEVSTA, but AFAICS there's
no use of that value.

We should have the addition of these new values in the same patch
that *uses* them.

Bjorn

  reply	other threads:[~2024-02-06 17:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  6:27 [PATCH v2 0/4] PCI/AER: Handle Advisory Non-Fatal properly Wang, Qingshun
2024-01-25  6:27 ` Wang, Qingshun
2024-01-25  6:27 ` [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info Wang, Qingshun
2024-01-25  6:27   ` Wang, Qingshun
2024-01-31  2:26   ` Kuppuswamy Sathyanarayanan
2024-01-31  2:26     ` Kuppuswamy Sathyanarayanan
2024-01-31  8:04     ` Wang, Qingshun
2024-01-31  8:04       ` Wang, Qingshun
2024-02-05 23:12   ` Bjorn Helgaas
2024-02-05 23:12     ` Bjorn Helgaas
2024-02-06 16:41     ` Wang, Qingshun
2024-02-06 16:41       ` Wang, Qingshun
2024-02-06 17:23       ` Bjorn Helgaas [this message]
2024-02-06 17:23         ` Bjorn Helgaas
2024-02-08 16:16         ` Wang, Qingshun
2024-02-08 16:16           ` Wang, Qingshun
2024-01-25  6:28 ` [PATCH v2 2/4] PCI/AER: Handle Advisory Non-Fatal properly Wang, Qingshun
2024-01-25  6:28   ` Wang, Qingshun
2024-02-05 23:26   ` Bjorn Helgaas
2024-02-05 23:26     ` Bjorn Helgaas
2024-02-06 16:46     ` Wang, Qingshun
2024-02-06 16:46       ` Wang, Qingshun
2024-01-25  6:28 ` [PATCH v2 3/4] PCI/AER: Fetch information for FTrace Wang, Qingshun
2024-01-25  6:28   ` Wang, Qingshun
2024-02-02 18:01   ` Dan Williams
2024-02-02 18:01     ` Dan Williams
2024-02-03  4:59     ` Wang, Qingshun
2024-02-03  4:59       ` Wang, Qingshun
2024-01-25  6:28 ` [PATCH v2 4/4] RAS: Trace more information in aer_event Wang, Qingshun
2024-01-25  6:28   ` Wang, Qingshun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240206172335.GA872811@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=adam.c.preble@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=erwin.tsaur@intel.com \
    --cc=feiting.wanyan@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=lenb@kernel.org \
    --cc=leoyang.li@nxp.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=qingshun.wang@intel.com \
    --cc=qingshun.wang@linux.intel.com \
    --cc=rafael@kernel.org \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.