All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aili Yao <yaoaili@kingsoft.com>
To: <qemu-devel@nongnu.org>
Cc: "Luck, Tony" <tony.luck@intel.com>,
	Borislav Petkov <bp@alien8.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	<yaoaili@kingsoft.com>, <sunhao2@kingsoft.com>
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()
Date: Wed, 24 Mar 2021 10:59:50 +0800	[thread overview]
Message-ID: <20210324105950.714fd8a6@alex-virtual-machine> (raw)
In-Reply-To: <20210224103921.3dcf0b65@alex-virtual-machine>

On Wed, 24 Feb 2021 10:39:21 +0800
Aili Yao <yaoaili@kingsoft.com> wrote:

> On Tue, 23 Feb 2021 16:12:43 +0000
> "Luck, Tony" <tony.luck@intel.com> wrote:
> 
> > > What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> > > So qemu treat all AR will be No RIPV, Do more is better than do less.    
> > 
> > RIPV would be important in the guest in the case where the guest can fix the problem that caused
> > the machine check and return to the failed instruction to continue.
> > 
> > I think the only case where this happens is a fault in a read-only page mapped from a file (typically
> > code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
> > but Linux can recover by reading data from the file into a new page.
> > 
> > Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).
> > 
> > So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
> > isn't possible, then this full recovery case turns into another SIGBUS case.  
> 
> This KVM and VM case of failing recovery for SRAR is just one scenario I think,
> If Intel guarantee that when memory SRAR is triggered, RIPV will always be set, then it's the job of qemu to
> set the RIPV instead.
> Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host.
> 
> And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible.
> 
> Thanks
> Aili Yao

ADD this topic to qemu list, this is really one bad issue.

Issue report:
when VM receive one SRAR memory failure from host, it all has RIPV cleared, and then vm process it and trigger one panic!

Can any qemu maintainer fix this?

Suggestion:
qemu get the true value of RIPV from host, the inject it to VM accordingly.

Thanks
Aili Yao!

WARNING: multiple messages have this Message-ID (diff)
From: Aili Yao <yaoaili@kingsoft.com>
To: <qemu-devel@nongnu.org>
Cc: "yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	sunhao2@kingsoft.com, yaoaili@kingsoft.com, "Luck,
	Tony" <tony.luck@intel.com>, "x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()
Date: Wed, 24 Mar 2021 10:59:50 +0800	[thread overview]
Message-ID: <20210324105950.714fd8a6@alex-virtual-machine> (raw)
In-Reply-To: <20210224103921.3dcf0b65@alex-virtual-machine>

On Wed, 24 Feb 2021 10:39:21 +0800
Aili Yao <yaoaili@kingsoft.com> wrote:

> On Tue, 23 Feb 2021 16:12:43 +0000
> "Luck, Tony" <tony.luck@intel.com> wrote:
> 
> > > What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> > > So qemu treat all AR will be No RIPV, Do more is better than do less.    
> > 
> > RIPV would be important in the guest in the case where the guest can fix the problem that caused
> > the machine check and return to the failed instruction to continue.
> > 
> > I think the only case where this happens is a fault in a read-only page mapped from a file (typically
> > code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
> > but Linux can recover by reading data from the file into a new page.
> > 
> > Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).
> > 
> > So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
> > isn't possible, then this full recovery case turns into another SIGBUS case.  
> 
> This KVM and VM case of failing recovery for SRAR is just one scenario I think,
> If Intel guarantee that when memory SRAR is triggered, RIPV will always be set, then it's the job of qemu to
> set the RIPV instead.
> Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host.
> 
> And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible.
> 
> Thanks
> Aili Yao

ADD this topic to qemu list, this is really one bad issue.

Issue report:
when VM receive one SRAR memory failure from host, it all has RIPV cleared, and then vm process it and trigger one panic!

Can any qemu maintainer fix this?

Suggestion:
qemu get the true value of RIPV from host, the inject it to VM accordingly.

Thanks
Aili Yao!


  reply	other threads:[~2021-03-24  3:01 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22  3:31 x86/mce: fix wrong no-return-ip logic in do_machine_check() Aili Yao
2021-02-22  3:50 ` [PATCH v2] " Aili Yao
2021-02-22  9:24   ` Borislav Petkov
2021-02-22  9:31     ` Aili Yao
2021-02-22 10:03       ` Borislav Petkov
2021-02-22 10:08         ` Aili Yao
2021-02-22 10:22           ` Borislav Petkov
2021-02-22 11:21             ` Aili Yao
2021-02-22 12:17               ` Aili Yao
2021-02-22 12:22                 ` Borislav Petkov
2021-02-22 12:35                   ` Aili Yao
2021-02-22 12:45                     ` Borislav Petkov
2021-02-23  2:27                       ` Aili Yao
2021-02-23  9:43                         ` Borislav Petkov
2021-02-23  9:56                           ` Aili Yao
2021-02-23 10:05                             ` Borislav Petkov
2021-02-23 11:27                               ` Aili Yao
2021-02-23 16:12                                 ` Luck, Tony
2021-02-24  2:39                                   ` Aili Yao
2021-03-24  2:59                                     ` Aili Yao [this message]
2021-03-24  2:59                                       ` Aili Yao
2021-03-24  8:03                                       ` Aili Yao
2021-03-24  8:03                                         ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210324105950.714fd8a6@alex-virtual-machine \
    --to=yaoaili@kingsoft.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sunhao2@kingsoft.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.