linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Xiaofei Tan <tanxiaofei@huawei.com>
To: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Linuxarm <linuxarm@huawei.com>, Will Deacon <will@kernel.org>,
	Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Shiju Jose <shiju.jose@huawei.com>
Subject: Re: Question about SEA handling process happened in user space
Date: Tue, 31 Mar 2020 17:41:30 +0800	[thread overview]
Message-ID: <5E83104A.7020803@huawei.com> (raw)
In-Reply-To: <2b0e5507-ad75-9af1-6afe-aa87d8cf597f@arm.com>

Hi James,

Thanks for kindly reply.

On 2020/3/31 0:49, James Morse wrote:
> Hi Xiaofei,
> 
> On 3/30/20 2:10 PM, Xiaofei Tan wrote:
>> I'm a little confused about the handling process of SEA happened in user space.
> 
>> Following the description of FnV bit of register ESR_ELx in ARMv8.4 SPEC,FAR is
>> valid only for synchronous External abort on a translation table walk.
> 
>> But for this FAR valid scenario(attached code from line 684 to 687),
>> we send signal SIGKILL to kill process. For some other scenario, such as line 680,
>> FAR is not valid, but we send SIGBUS and transfer error address to process to try to do some recovery.
> 
> 'FAR is not valid': its optional. The ESR_EL1.FnV bit can be set for the 'catch
> all' external abort fault-status-code. This lets the CPU tell us that it doesn't
> know what the faulting virtual address is.
> 
> do_sea() checks for this:
> |	if (esr & ESR_ELx_FnV)
> |		siaddr = NULL;
> |	else
> |		siaddr  = (void __user *)addr;
> 
> If we can't know the address, there isn't much we can do.
> (This check is really making sure we don't pass junk to user-space when FnV is set)
> 
> 

OK. So even if FAR is not valid, we still send SIGBUS for SEA, not on translation table walk, but
set the addr to NULL here.

>> should it be an problem ?
> 
> I'm not quite sure what your question is.
> 
> If the CPU doesn't tell us the address, we can't tell user-space what it is. The
> alternative is to upgrade to SIGKILL in that case.
> 

Got it. May be the description of FnV bit of register ESR_ELx is not quite exactly. Because
following the code, CPU may still have an chance to tell the address for SEA, not on translation table walk.

> 
> If you see this instead of the address provided via firmware-first, there is a
> series to improve that here:
> https://lore.kernel.org/linux-acpi/20200228174817.74278-1-james.morse@arm.com/
> 
> (We skip this signal code of APEI promises it did all the work. This lets you
> take the signal from memory_failure() instead, which may have better information.)
> 

This should be an great direction.
I have two concerns.
1.memory_failure() is only called for "memory error section" record. Then
should we use this memory record for ghes sea report? Our platform is
using "ARM processor error section".
2.Should we define an error source structure for each cpu core in HEST table?
If not, there may be conflict if more than one cpu core fall into SEA.

> 
> If its the SIGKILL entries: these are for the translation table walk.
> There is no point telling user-space about corruption in its page tables as it
> can't do anything about it. The kernel's handling of this is to kill the
> process. (page tables make up a very small amount of memory, so this should be
> rarer than the regular 'external abort' case)
> 

Hmm, then it is useless that CPU record address for this entries.

> 
> Thanks,
> 
> James
> 
> 
> 
>> 680         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
>> 684         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
>> 685         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
>> 686         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
>> 687         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
>> 688         { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
>> 692         { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 693         { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 694         { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 695         { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
>> 696         { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },
> 
> .
> 

-- 
 thanks
tanxiaofei


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-03-31  9:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30 13:10 Question about SEA handling process happened in user space Xiaofei Tan
2020-03-30 16:49 ` James Morse
2020-03-31  9:41   ` Xiaofei Tan [this message]
2020-03-31 17:00     ` James Morse
2020-04-01  3:49       ` Xiaofei Tan
2020-04-07 16:37         ` James Morse
2020-04-09  8:42           ` Xiaofei Tan
2020-04-09 14:28             ` James Morse
2020-04-10  2:55               ` Xiaofei Tan
2020-04-16 13:27                 ` James Morse
2020-04-18 10:49                   ` Xiaofei Tan
2020-04-02  6:35   ` Xiaofei Tan
2020-04-07 16:37     ` James Morse
2020-04-09  9:17       ` Xiaofei Tan
2020-04-09 14:28         ` James Morse
2020-04-10  9:43           ` Xiaofei Tan
2020-04-16 13:50             ` James Morse
2020-04-18 11:25               ` Xiaofei Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5E83104A.7020803@huawei.com \
    --to=tanxiaofei@huawei.com \
    --cc=Dave.Martin@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linuxarm@huawei.com \
    --cc=shiju.jose@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).