linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cao jin <caoj.fnst@cn.fujitsu.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: <linux-kernel@vger.kernel.org>, <kvm@vger.kernel.org>,
	<izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Thu, 1 Dec 2016 21:40:23 +0800	[thread overview]
Message-ID: <58402847.2000003@cn.fujitsu.com> (raw)
In-Reply-To: <20161201064708-mutt-send-email-mst@kernel.org>



On 12/01/2016 12:51 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 30, 2016 at 09:04:13PM -0700, Alex Williamson wrote:
>> On Sun, 27 Nov 2016 19:34:17 +0800
>> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
>>

>>> @@ -1187,10 +1200,30 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
>>>  		return PCI_ERS_RESULT_DISCONNECT;
>>>  	}
>>>  
>>> +	/* get device's uncorrectable error status as soon as possible,
>>> +	 * and signal it to user space. The later we read it, the possibility
>>> +	 * the register value is mangled grows. */
>>
>> Bad comment style (throughout).
>>
>>> +	aer_cap_offset = pci_find_ext_capability(vdev->pdev, PCI_EXT_CAP_ID_ERR);
>>> +	ret = pci_read_config_dword(vdev->pdev, aer_cap_offset + 
>>> +                                    PCI_ERR_UNCOR_STATUS, &uncor_status);
>>> +        if (ret)
>>> +                return PCI_ERS_RESULT_DISCONNECT;
>>> +
>>> +	pr_err("device %d got AER detect notification. uncorrectable error status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
>>>  	mutex_lock(&vdev->igate);
>>> +    
>>> +	vdev->aer_recovering = true;
>>> +	reinit_completion(&vdev->aer_error_completion);
>>> +
>>> +	/* suspend config space access from user space,
>>> +	 * when vfio-pci's error recovery process is on */
>>> +	pci_cfg_access_trylock(vdev->pdev);
>>>  
>>> -	if (vdev->err_trigger)
>>> -		eventfd_signal(vdev->err_trigger, 1);
>>> +	if (vdev->err_trigger && uncor_status) {
>>> +		pr_err("device %d signal uncor status to user space", pdev->devfn);//may be removed
>>> +		/* signal uncorrectable error status to user space */
>>> +		eventfd_signal(vdev->err_trigger, uncor_status);
> 
> What does this mean that we trigger with uncor_status as opposed to 1?
> 

I guess you missed the changelog in qemu patchset's cover letter, see if
it helps(copy from cover letter):

5. Change what eventfd signals(check vfio driver patch). Before,
vfio-pci driver add 1 to the counter, which doesn't have meaning, just
for notification. Now, vfio-pci's error detected handler read the
uncorrectable error status and signal it to qemu.

Why? When testing previous version(host aer driver still reset link),
found that there is quite possibility that register reading returns the
invalid value 0xFFFFFFFF, which results in all(2 in my environment)
qemu's vfio-pci function send AER to root port while I only inject error
into one function. This is unreasonable, and I think it is the result of
reading during device reset. Previous patch does considered to find the
real function who has error happened, but won't work under the situation
I found. So move the register reading as early as possible would be the
nature thoughts, and it is moved into vfio-pci driver. Although now
reset_link is disabled in aer driver, get the uncor error status as
early as possible still make sense, for example: if user space driver
does device reset once receiving the error notification, and then read
register.

-- 
Sincerely,
Cao jin

  reply	other threads:[~2016-12-01 13:36 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin
2016-12-01 13:40     ` Cao jin [this message]
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58402847.2000003@cn.fujitsu.com \
    --to=caoj.fnst@cn.fujitsu.com \
    --cc=alex.williamson@redhat.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).