All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Tong Tiangen <tongtiangen@huawei.com>,
	Mark Rutland <mark.rutland@arm.com>,
	james.morse@arm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
Date: Fri, 8 Apr 2022 12:11:06 +0100	[thread overview]
Message-ID: <efed49aa-5a07-cd97-e58f-eec1b5840b9c@arm.com> (raw)
In-Reply-To: <d7ef4e03-343e-f7e3-2890-a45b87ac8203@huawei.com>

On 2022-04-08 03:43, Tong Tiangen wrote:
> 
> 
> 在 2022/4/7 23:53, Robin Murphy 写道:
>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>
>>>
>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>> When user process reading file, the data is cached in pagecache and
>>>>> the data belongs to the user process, When machine check error is
>>>>> encountered during pagecache reading, killing the user process and
>>>>> isolate the user page with hardware memory errors is a more reasonable
>>>>> choice than kernel panic.
>>>>>
>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>> check safe.
>>>>
>>>> As with prior patches, *why* is the distinction necessary?
>>>>
>>>> This patch adds a bunch of conditional logic, but *structurally* it 
>>>> doesn't
>>>> alter the handling to be substantially different for the MC and 
>>>> non-MC cases.
>>>>
>>>> This seems like pointless duplication that just makes it harder to 
>>>> maintain
>>>> this code.
>>>>
>>>> Thanks,
>>>> Mark.
>>>
>>> Agreed, The implementation here looks a little ugly and harder to 
>>> maintain.
>>>
>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>
>>> A memory error is consumed when reading pagecache using copy_to_user.
>>> I think in this scenario, only the process is affected because it 
>>> can't read
>>> pagecache data correctly. Just kill the process and don't need the whole
>>> kernel panic.
>>>
>>> So I need two different copy_to_user implementation, one is existing 
>>> __arch_copy_to_user,
>>> this function will panic when consuming memory errors. The other one 
>>> is this new helper
>>> __arch_copy_mc_to_user, this interface is used when reading 
>>> pagecache. It can recover from
>>> consume memory error.
>>
>> OK, but do we really need two almost-identical implementations of 
>> every function where the only difference is how the exception table 
>> entries are annotated? Could the exception handler itself just figure 
>> out who owns the page where the fault occurred and decide what action 
>> to take as appropriate?
>>
>> Robin.
>>
> 
> Thank you, Robin.
> 
> I added this call path in this patchset: do_sea() -> fixup_exception(), 
> the purpose is to provide a chance for sea fault to fixup (refer patch 
> 3/7).
> 
> If fixup successful, panic can be avoided. Otherwise, panic can be 
> eliminated according to the original logic.
> 
> fixup_exception() will set regs->pc and jump to regs->pc when the 
> context recovery of an exception occurs.
> 
> If mc-safe-fixup added to  __arch_copy_to_user(),  in *non pagecache 
> reading* scenario encount memory error when call __arch_copy_to_user() , 
> do_sea() -> fixup_exception() will not return fail and will miss the 
> panic logic in do_sea().
> 
> So I add new helper __arch_copy_mc_to_user()  and add mc-safe-fixup to 
> this helper, which can be used in the required scenarios. At present, 
> there is only one pagecache reading scenario, other scenarios need to be 
> developed.
> 
> This is my current confusion. Of course, I will think about the solution 
> to  solve the duplicate code problem.

Right, but if the point is that faults in pagecahe pages are special, 
surely __do_kernel_fault() could ultimately figure out from the address 
whether that's the case or not?

In general, if the principle is that whether a fault is recoverable or 
not depends on what was being accessed, then to me it seems 
fundamentally more robust to base that decision on details of the fault 
that actually occurred, rather than what the caller thought it was 
supposed to be doing at the time.

Thanks,
Robin.

WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Tong Tiangen <tongtiangen@huawei.com>,
	Mark Rutland <mark.rutland@arm.com>,
	james.morse@arm.com
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [RFC PATCH -next V2 7/7] arm64: add pagecache reading to machine check safe
Date: Fri, 8 Apr 2022 12:11:06 +0100	[thread overview]
Message-ID: <efed49aa-5a07-cd97-e58f-eec1b5840b9c@arm.com> (raw)
In-Reply-To: <d7ef4e03-343e-f7e3-2890-a45b87ac8203@huawei.com>

On 2022-04-08 03:43, Tong Tiangen wrote:
> 
> 
> 在 2022/4/7 23:53, Robin Murphy 写道:
>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>
>>>
>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>> When user process reading file, the data is cached in pagecache and
>>>>> the data belongs to the user process, When machine check error is
>>>>> encountered during pagecache reading, killing the user process and
>>>>> isolate the user page with hardware memory errors is a more reasonable
>>>>> choice than kernel panic.
>>>>>
>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>> check safe.
>>>>
>>>> As with prior patches, *why* is the distinction necessary?
>>>>
>>>> This patch adds a bunch of conditional logic, but *structurally* it 
>>>> doesn't
>>>> alter the handling to be substantially different for the MC and 
>>>> non-MC cases.
>>>>
>>>> This seems like pointless duplication that just makes it harder to 
>>>> maintain
>>>> this code.
>>>>
>>>> Thanks,
>>>> Mark.
>>>
>>> Agreed, The implementation here looks a little ugly and harder to 
>>> maintain.
>>>
>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>
>>> A memory error is consumed when reading pagecache using copy_to_user.
>>> I think in this scenario, only the process is affected because it 
>>> can't read
>>> pagecache data correctly. Just kill the process and don't need the whole
>>> kernel panic.
>>>
>>> So I need two different copy_to_user implementation, one is existing 
>>> __arch_copy_to_user,
>>> this function will panic when consuming memory errors. The other one 
>>> is this new helper
>>> __arch_copy_mc_to_user, this interface is used when reading 
>>> pagecache. It can recover from
>>> consume memory error.
>>
>> OK, but do we really need two almost-identical implementations of 
>> every function where the only difference is how the exception table 
>> entries are annotated? Could the exception handler itself just figure 
>> out who owns the page where the fault occurred and decide what action 
>> to take as appropriate?
>>
>> Robin.
>>
> 
> Thank you, Robin.
> 
> I added this call path in this patchset: do_sea() -> fixup_exception(), 
> the purpose is to provide a chance for sea fault to fixup (refer patch 
> 3/7).
> 
> If fixup successful, panic can be avoided. Otherwise, panic can be 
> eliminated according to the original logic.
> 
> fixup_exception() will set regs->pc and jump to regs->pc when the 
> context recovery of an exception occurs.
> 
> If mc-safe-fixup added to  __arch_copy_to_user(),  in *non pagecache 
> reading* scenario encount memory error when call __arch_copy_to_user() , 
> do_sea() -> fixup_exception() will not return fail and will miss the 
> panic logic in do_sea().
> 
> So I add new helper __arch_copy_mc_to_user()  and add mc-safe-fixup to 
> this helper, which can be used in the required scenarios. At present, 
> there is only one pagecache reading scenario, other scenarios need to be 
> developed.
> 
> This is my current confusion. Of course, I will think about the solution 
> to  solve the duplicate code problem.

Right, but if the point is that faults in pagecahe pages are special, 
surely __do_kernel_fault() could ultimately figure out from the address 
whether that's the case or not?

In general, if the principle is that whether a fault is recoverable or 
not depends on what was being accessed, then to me it seems 
fundamentally more robust to base that decision on details of the fault 
that actually occurred, rather than what the caller thought it was 
supposed to be doing at the time.

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-04-08 11:11 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06  9:13 [RFC PATCH -next V2 0/7]arm64: add machine check safe support Tong Tiangen
2022-04-06  9:13 ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 1/7] x86: fix copy_mc_to_user compile error Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06  9:22   ` Borislav Petkov
2022-04-06  9:22     ` Borislav Petkov
2022-04-06 10:02     ` Tong Tiangen
2022-04-06 10:02       ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 2/7] arm64: fix page_address return value in copy_highpage Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06 10:22   ` Mark Rutland
2022-04-06 10:22     ` Mark Rutland
2022-04-06 12:47     ` Tong Tiangen
2022-04-06 12:47       ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 3/7] arm64: add support for machine check error safe Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06 10:58   ` Mark Rutland
2022-04-06 10:58     ` Mark Rutland
2022-04-07 14:26     ` Tong Tiangen
2022-04-07 14:26       ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 4/7] arm64: add copy_from_user to machine check safe Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06 11:19   ` Mark Rutland
2022-04-06 11:19     ` Mark Rutland
2022-04-07 14:28     ` Tong Tiangen
2022-04-07 14:28       ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 5/7] arm64: add get_user " Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06 11:22   ` Mark Rutland
2022-04-06 11:22     ` Mark Rutland
2022-04-07 14:38     ` Tong Tiangen
2022-04-07 14:38       ` Tong Tiangen
2022-04-08 15:22       ` Mark Rutland
2022-04-08 15:22         ` Mark Rutland
2022-04-09  9:17         ` Tong Tiangen
2022-04-09  9:17           ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 6/7] arm64: add cow " Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06  9:13 ` [RFC PATCH -next V2 7/7] arm64: add pagecache reading " Tong Tiangen
2022-04-06  9:13   ` Tong Tiangen
2022-04-06 11:27   ` Mark Rutland
2022-04-06 11:27     ` Mark Rutland
2022-04-07 14:56     ` Tong Tiangen
2022-04-07 14:56       ` Tong Tiangen
2022-04-07 15:53       ` Robin Murphy
2022-04-07 15:53         ` Robin Murphy
2022-04-08  2:43         ` Tong Tiangen
2022-04-08  2:43           ` Tong Tiangen
2022-04-08 11:11           ` Robin Murphy [this message]
2022-04-08 11:11             ` Robin Murphy
2022-04-09  9:24             ` Tong Tiangen
2022-04-09  9:24               ` Tong Tiangen
2022-04-06 10:04 ` [RFC PATCH -next V2 0/7]arm64: add machine check safe support Mark Rutland
2022-04-06 10:04   ` Mark Rutland
2022-04-07  4:21   ` Tong Tiangen
2022-04-07  4:21     ` Tong Tiangen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efed49aa-5a07-cd97-e58f-eec1b5840b9c@arm.com \
    --to=robin.murphy@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tongtiangen@huawei.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.