All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
To: James Morse <james.morse@arm.com>, Xie XiuQi <xiexiuqi@huawei.com>
Cc: christoffer.dall@linaro.org, marc.zyngier@arm.com,
	catalin.marinas@arm.com, will.deacon@arm.com, fu.wei@linaro.org,
	rostedt@goodmis.org, hanjun.guo@linaro.org,
	shiju.jose@huawei.com, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	gengdongjiu@huawei.com, zhengqiang10@huawei.com,
	wuquanming@huawei.com,
	Wang Xiongfeng <wangxiongfengi2@huawei.com>
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler
Date: Wed, 12 Apr 2017 16:35:07 +0800	[thread overview]
Message-ID: <5b914702-7262-54d5-2f64-0521a04add03@huawei.com> (raw)
In-Reply-To: <58E7B6BD.3000401@arm.com>

Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <wangxiongfeng2@huawei.com>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
> 
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
> 
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
> 
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
> 
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
> 
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
> 
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng

WARNING: multiple messages have this Message-ID (diff)
From: Xiongfeng Wang <wangxiongfeng2@huawei.com>
To: James Morse <james.morse@arm.com>, Xie XiuQi <xiexiuqi@huawei.com>
Cc: <christoffer.dall@linaro.org>, <marc.zyngier@arm.com>,
	<catalin.marinas@arm.com>, <will.deacon@arm.com>,
	<fu.wei@linaro.org>, <rostedt@goodmis.org>,
	<hanjun.guo@linaro.org>, <shiju.jose@huawei.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<gengdongjiu@huawei.com>, <zhengqiang10@huawei.com>,
	<wuquanming@huawei.com>,
	Wang Xiongfeng <wangxiongfengi2@huawei.com>
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler
Date: Wed, 12 Apr 2017 16:35:07 +0800	[thread overview]
Message-ID: <5b914702-7262-54d5-2f64-0521a04add03@huawei.com> (raw)
In-Reply-To: <58E7B6BD.3000401@arm.com>

Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <wangxiongfeng2@huawei.com>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
> 
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
> 
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
> 
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
> 
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
> 
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
> 
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng

WARNING: multiple messages have this Message-ID (diff)
From: wangxiongfeng2@huawei.com (Xiongfeng Wang)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler
Date: Wed, 12 Apr 2017 16:35:07 +0800	[thread overview]
Message-ID: <5b914702-7262-54d5-2f64-0521a04add03@huawei.com> (raw)
In-Reply-To: <58E7B6BD.3000401@arm.com>

Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <wangxiongfeng2@huawei.com>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
> 
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
> 
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
> 
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
> 
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
> 
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
> 
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng

  reply	other threads:[~2017-04-12  8:35 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-30 10:31 [PATCH v3 0/8] arm64: acpi: apei: handle SEI notification type for ARMv8 Xie XiuQi
2017-03-30 10:31 ` Xie XiuQi
2017-03-30 10:31 ` Xie XiuQi
2017-03-30 10:31 ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 1/8] trace: ras: add ARM processor error information trace event Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 16:02   ` Steven Rostedt
2017-03-30 16:02     ` Steven Rostedt
2017-03-30 16:02     ` Steven Rostedt
2017-04-06  9:03     ` Xie XiuQi
2017-04-06  9:03       ` Xie XiuQi
2017-04-06  9:03       ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8 Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-31 16:20   ` James Morse
2017-03-31 16:20     ` James Morse
2017-04-06  9:11     ` Xie XiuQi
2017-04-06  9:11       ` Xie XiuQi
2017-04-06  9:11       ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 3/8] arm64: apei: add a per-cpu variable to indecate sei is processing Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-31 16:22   ` James Morse
2017-03-31 16:22     ` James Morse
2017-04-06  9:25     ` Xie XiuQi
2017-04-06  9:25       ` Xie XiuQi
2017-04-06  9:25       ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 5/8] arm64: KVM: add guest SEI support Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 6/8] arm64: RAS: add ras extension runtime detection Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-04-13  8:44   ` Xiongfeng Wang
2017-04-13  8:44     ` Xiongfeng Wang
2017-04-13  8:44     ` Xiongfeng Wang
2017-04-13 10:51   ` Mark Rutland
2017-04-13 10:51     ` Mark Rutland
2017-04-13 10:51     ` Mark Rutland
2017-04-14  7:03     ` Xie XiuQi
2017-04-14  7:03       ` Xie XiuQi
2017-04-14  7:03       ` Xie XiuQi
2017-04-18  1:09     ` Xiongfeng Wang
2017-04-18  1:09       ` Xiongfeng Wang
2017-04-18  1:09       ` Xiongfeng Wang
2017-04-18 10:51       ` James Morse
2017-04-18 10:51         ` James Morse
2017-04-18 10:51         ` James Morse
2017-04-19  2:37         ` Xiongfeng Wang
2017-04-19  2:37           ` Xiongfeng Wang
2017-04-19  2:37           ` Xiongfeng Wang
2017-04-20  8:52           ` James Morse
2017-04-20  8:52             ` James Morse
2017-04-20  8:52             ` James Morse
2017-04-21 11:33             ` Xiongfeng Wang
2017-04-21 11:33               ` Xiongfeng Wang
2017-04-21 11:33               ` Xiongfeng Wang
2017-04-24 17:14               ` James Morse
2017-04-24 17:14                 ` James Morse
2017-04-24 17:14                 ` James Morse
2017-04-28  2:55                 ` Xiongfeng Wang
2017-04-28  2:55                   ` Xiongfeng Wang
2017-04-28  2:55                   ` Xiongfeng Wang
2017-05-08 17:27                   ` James Morse
2017-05-08 17:27                     ` James Morse
2017-05-09  2:16                     ` Xiongfeng Wang
2017-05-09  2:16                       ` Xiongfeng Wang
2017-05-09  2:16                       ` Xiongfeng Wang
2017-04-21 10:46   ` Xiongfeng Wang
2017-04-21 10:46     ` Xiongfeng Wang
2017-04-21 10:46     ` Xiongfeng Wang
2017-03-30 10:31 ` [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-04-07 15:56   ` James Morse
2017-04-07 15:56     ` James Morse
2017-04-07 15:56     ` James Morse
2017-04-12  8:35     ` Xiongfeng Wang [this message]
2017-04-12  8:35       ` Xiongfeng Wang
2017-04-12  8:35       ` Xiongfeng Wang
2017-03-30 10:31 ` [PATCH v3 0/8] arm64: acpi: apei: handle SEI notification type for ARMv8 Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 1/8] trace: ras: add ARM processor error information trace event Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-04-14 20:36   ` Baicar, Tyler
2017-04-14 20:36     ` Baicar, Tyler
2017-04-17  3:08     ` Xie XiuQi
2017-04-17  3:08       ` Xie XiuQi
2017-04-17  3:08       ` Xie XiuQi
2017-04-17  3:16       ` Xie XiuQi
2017-04-17  3:16         ` Xie XiuQi
2017-04-17  3:16         ` Xie XiuQi
2017-04-17 17:18         ` Baicar, Tyler
2017-04-17 17:18           ` Baicar, Tyler
2017-04-18  2:22           ` Xie XiuQi
2017-04-18  2:22             ` Xie XiuQi
2017-04-18  2:22             ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 2/8] acpi: apei: handle SEI notification type for ARMv8 Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 3/8] arm64: apei: add a per-cpu variable to indecate sei is processing Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 4/8] APEI: GHES: reserve a virtual page for SEI context Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 5/8] arm64: KVM: add guest SEI support Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 6/8] arm64: RAS: add ras extension runtime detection Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31 ` [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi
2017-03-30 10:31   ` Xie XiuQi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b914702-7262-54d5-2f64-0521a04add03@huawei.com \
    --to=wangxiongfeng2@huawei.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=fu.wei@linaro.org \
    --cc=gengdongjiu@huawei.com \
    --cc=hanjun.guo@linaro.org \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=shiju.jose@huawei.com \
    --cc=wangxiongfengi2@huawei.com \
    --cc=will.deacon@arm.com \
    --cc=wuquanming@huawei.com \
    --cc=xiexiuqi@huawei.com \
    --cc=zhengqiang10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.