All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Oza Pawandeep <poza@codeaurora.org>
Cc: linux-pci@vger.kernel.org, okaya@codeaurora.org,
	timur@codeaurora.org,
	Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dongdong Liu <liudongdong3@huawei.com>,
	linux-arm-msm@vger.kernel.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled
Date: Wed, 15 Nov 2017 15:14:47 -0600	[thread overview]
Message-ID: <20171115211447.GA7266@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <1510721808-27164-5-git-send-email-poza@codeaurora.org>

On Wed, Nov 15, 2017 at 10:26:48AM +0530, Oza Pawandeep wrote:
> PCI Express Base Specification, Rev. 4.0 Version 0.9
> 6.2.10: Downstream Port Containment (DPC)
> 
> DPC is an optional normative feature of a Downstream Port. DPC halts PCI
> Express traffic below a Downstream Port after an unmasked uncorrectable
> error is detected at or below the Port, avoiding the potential spread of
> any data corruption, and permitting error recovery if supported by
> software
> 
> Triggering DPC disables its Link by directing the LTSSM to the Disabled
> state. Once the LTSSM reaches the Disabled state, it remains in that
> state until the DPC Trigger Status bit is Cleared
> 
> So when DPC service is active and registered to port driver, AER should
> not attempt to recover, since DPC will be removing downstream devices,
> and do the recovery.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 7448052..a9108ea 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -482,6 +482,27 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> + * pcie_port_query_uptream_service - query upstream service
> + * @dev: pointer to a pci_dev data structure of agent detecting an error
> + * @service: service to be queried
> + *
> + * Invoked to know the status of the service for pci device.
> + */
> +static bool pcie_port_query_uptream_service(struct pci_dev *dev, u32 service)
> +{
> +	struct pci_dev *upstream_dev = dev;
> +
> +	do {
> +		if (pcie_port_query_service(upstream_dev, service))
> +			return true;
> +		upstream_dev = pcie_port_upstream_bridge(upstream_dev);
> +	} while (upstream_dev);
> +
> +	return false;
> +}
> +
> +
> +/**
>   * do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
> @@ -495,6 +516,18 @@ static void do_recovery(struct pci_dev *dev, int severity)
>  	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
>  	enum pci_channel_state state;
>  
> +	/*
> +	 * If DPC is enabled, there is no need to attempt recovery.
> +	 * Since DPC disables its Link by directing the LTSSM to
> +	 * the Disabled state.
> +	 * DPC driver will take care of the recovery, there is no need
> +	 * for AER driver to race.
> +	 */
> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> +		return;
> +	}

What happens without this test?

Does AER read registers from the now-disabled device and get ~0 data?
Or is AER reading registers from the port upstream from the disabled
device and trying to reset the device?

It looks like get_device_error_info() reads registers and doesn't
check to see whether it gets ~0 back.  I'm wondering if we *should* be
checking there and whether doing that would help mitigate the issue
here.

I don't really like the pcie_port_query_uptream_service() approach
(BTW, the name is misspelled) because it feels a little ad hoc and it
makes assumptions here in the AER code about what the DPC code is
doing, e.g., how it has configured PCI_EXP_DPC_CTL.

>  	if (severity == AER_FATAL)
>  		state = pci_channel_io_frozen;
>  	else
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
> a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Oza Pawandeep <poza@codeaurora.org>
Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	linux-pci@vger.kernel.org, timur@codeaurora.org,
	okaya@codeaurora.org, linux-arm-kernel@lists.infradead.org,
	Dongdong Liu <liudongdong3@huawei.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-msm@vger.kernel.org
Subject: Re: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled
Date: Wed, 15 Nov 2017 15:14:47 -0600	[thread overview]
Message-ID: <20171115211447.GA7266@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <1510721808-27164-5-git-send-email-poza@codeaurora.org>

On Wed, Nov 15, 2017 at 10:26:48AM +0530, Oza Pawandeep wrote:
> PCI Express Base Specification, Rev. 4.0 Version 0.9
> 6.2.10: Downstream Port Containment (DPC)
> 
> DPC is an optional normative feature of a Downstream Port. DPC halts PCI
> Express traffic below a Downstream Port after an unmasked uncorrectable
> error is detected at or below the Port, avoiding the potential spread of
> any data corruption, and permitting error recovery if supported by
> software
> 
> Triggering DPC disables its Link by directing the LTSSM to the Disabled
> state. Once the LTSSM reaches the Disabled state, it remains in that
> state until the DPC Trigger Status bit is Cleared
> 
> So when DPC service is active and registered to port driver, AER should
> not attempt to recover, since DPC will be removing downstream devices,
> and do the recovery.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 7448052..a9108ea 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -482,6 +482,27 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> + * pcie_port_query_uptream_service - query upstream service
> + * @dev: pointer to a pci_dev data structure of agent detecting an error
> + * @service: service to be queried
> + *
> + * Invoked to know the status of the service for pci device.
> + */
> +static bool pcie_port_query_uptream_service(struct pci_dev *dev, u32 service)
> +{
> +	struct pci_dev *upstream_dev = dev;
> +
> +	do {
> +		if (pcie_port_query_service(upstream_dev, service))
> +			return true;
> +		upstream_dev = pcie_port_upstream_bridge(upstream_dev);
> +	} while (upstream_dev);
> +
> +	return false;
> +}
> +
> +
> +/**
>   * do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
> @@ -495,6 +516,18 @@ static void do_recovery(struct pci_dev *dev, int severity)
>  	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
>  	enum pci_channel_state state;
>  
> +	/*
> +	 * If DPC is enabled, there is no need to attempt recovery.
> +	 * Since DPC disables its Link by directing the LTSSM to
> +	 * the Disabled state.
> +	 * DPC driver will take care of the recovery, there is no need
> +	 * for AER driver to race.
> +	 */
> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> +		return;
> +	}

What happens without this test?

Does AER read registers from the now-disabled device and get ~0 data?
Or is AER reading registers from the port upstream from the disabled
device and trying to reset the device?

It looks like get_device_error_info() reads registers and doesn't
check to see whether it gets ~0 back.  I'm wondering if we *should* be
checking there and whether doing that would help mitigate the issue
here.

I don't really like the pcie_port_query_uptream_service() approach
(BTW, the name is misspelled) because it feels a little ad hoc and it
makes assumptions here in the AER code about what the DPC code is
doing, e.g., how it has configured PCI_EXP_DPC_CTL.

>  	if (severity == AER_FATAL)
>  		state = pci_channel_io_frozen;
>  	else
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
> a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: helgaas@kernel.org (Bjorn Helgaas)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled
Date: Wed, 15 Nov 2017 15:14:47 -0600	[thread overview]
Message-ID: <20171115211447.GA7266@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <1510721808-27164-5-git-send-email-poza@codeaurora.org>

On Wed, Nov 15, 2017 at 10:26:48AM +0530, Oza Pawandeep wrote:
> PCI Express Base Specification, Rev. 4.0 Version 0.9
> 6.2.10: Downstream Port Containment (DPC)
> 
> DPC is an optional normative feature of a Downstream Port. DPC halts PCI
> Express traffic below a Downstream Port after an unmasked uncorrectable
> error is detected at or below the Port, avoiding the potential spread of
> any data corruption, and permitting error recovery if supported by
> software
> 
> Triggering DPC disables its Link by directing the LTSSM to the Disabled
> state. Once the LTSSM reaches the Disabled state, it remains in that
> state until the DPC Trigger Status bit is Cleared
> 
> So when DPC service is active and registered to port driver, AER should
> not attempt to recover, since DPC will be removing downstream devices,
> and do the recovery.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 7448052..a9108ea 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -482,6 +482,27 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> + * pcie_port_query_uptream_service - query upstream service
> + * @dev: pointer to a pci_dev data structure of agent detecting an error
> + * @service: service to be queried
> + *
> + * Invoked to know the status of the service for pci device.
> + */
> +static bool pcie_port_query_uptream_service(struct pci_dev *dev, u32 service)
> +{
> +	struct pci_dev *upstream_dev = dev;
> +
> +	do {
> +		if (pcie_port_query_service(upstream_dev, service))
> +			return true;
> +		upstream_dev = pcie_port_upstream_bridge(upstream_dev);
> +	} while (upstream_dev);
> +
> +	return false;
> +}
> +
> +
> +/**
>   * do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
> @@ -495,6 +516,18 @@ static void do_recovery(struct pci_dev *dev, int severity)
>  	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
>  	enum pci_channel_state state;
>  
> +	/*
> +	 * If DPC is enabled, there is no need to attempt recovery.
> +	 * Since DPC disables its Link by directing the LTSSM to
> +	 * the Disabled state.
> +	 * DPC driver will take care of the recovery, there is no need
> +	 * for AER driver to race.
> +	 */
> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> +		return;
> +	}

What happens without this test?

Does AER read registers from the now-disabled device and get ~0 data?
Or is AER reading registers from the port upstream from the disabled
device and trying to reset the device?

It looks like get_device_error_info() reads registers and doesn't
check to see whether it gets ~0 back.  I'm wondering if we *should* be
checking there and whether doing that would help mitigate the issue
here.

I don't really like the pcie_port_query_uptream_service() approach
(BTW, the name is misspelled) because it feels a little ad hoc and it
makes assumptions here in the AER code about what the DPC code is
doing, e.g., how it has configured PCI_EXP_DPC_CTL.

>  	if (severity == AER_FATAL)
>  		state = pci_channel_io_frozen;
>  	else
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
> a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2017-11-15 21:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-15  4:56 [PATCH v2 0/4] PCI: query active service list Oza Pawandeep
2017-11-15  4:56 ` Oza Pawandeep
2017-11-15  4:56 ` Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 1/4] PCI: Add port service list node for pci_dev Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 2/4] PCI/portdrv: Add/Remove port services to the list Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 3/4] PCI/portdrv: Implement interface to query the registered service Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15  4:56   ` Oza Pawandeep
2017-11-15 21:14   ` Bjorn Helgaas [this message]
2017-11-15 21:14     ` Bjorn Helgaas
2017-11-15 21:14     ` Bjorn Helgaas
2017-11-16 14:03     ` Sinan Kaya
2017-11-16 14:03       ` Sinan Kaya
2017-11-16 20:17       ` Bjorn Helgaas
2017-11-16 20:17         ` Bjorn Helgaas
2017-11-16 20:17         ` Bjorn Helgaas
2017-11-16 20:52         ` Sinan Kaya
2017-11-16 20:52           ` Sinan Kaya
2017-11-18  0:02           ` Bjorn Helgaas
2017-11-18  0:02             ` Bjorn Helgaas
2017-11-18  0:02             ` Bjorn Helgaas
2017-11-19 16:41             ` Sinan Kaya
2017-11-19 16:41               ` Sinan Kaya
2017-11-21 16:25       ` David Laight
2017-11-21 16:25         ` David Laight
2017-11-21 16:25         ` David Laight
2017-11-21 16:43         ` Sinan Kaya
2017-11-21 16:43           ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171115211447.GA7266@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=gabriele.paoloni@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=okaya@codeaurora.org \
    --cc=poza@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.