linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ding Hui <dinghui@sangfor.com.cn>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: bp@alien8.de, bp@suse.de, naoya.horiguchi@nec.com,
	osalvador@suse.de, peterz@infradead.org,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, mingo@redhat.com, x86@kernel.org,
	hpa@zytor.com, youquan.song@intel.com, huangcun@sangfor.com.cn,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC
Date: Wed, 7 Jul 2021 17:51:20 +0800	[thread overview]
Message-ID: <fffec03b-2601-a0c0-5954-ee05fe046ba1@sangfor.com.cn> (raw)
In-Reply-To: <6a1b1371-50e4-f0f6-1ebd-0a91fc9d7bcc@sangfor.com.cn>

On 2021/7/7 11:39, Ding Hui wrote:
> On 2021/7/7 0:44, Luck, Tony wrote:
>> On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote:
>>> Recently we encounter multi #MC on the same task when it's
>>> task_work_run() has not been called, current->mce_kill_me was
>>> added to task_works list more than once, that make a circular
>>> linked task_works, so task_work_run() will do a endless loop.
>>
>> I saw the same and posted a similar fix a while back:
>>
>> https://www.spinics.net/lists/linux-mm/msg251006.html
>>
>> It didn't get merged because some validation tests began failing
>> around the same time.  I'm now pretty sure I understand what happened
>> with those other tests.
>>
>> I'll post my updated version (second patch in a three part series)
>> later today.
>>
> 
> Thanks for your fixes.
> 
> After digging my original problem, maybe I find out why I met #MC flood.
> 
> My test case:
> 1. run qemu-kvm guest VM, OS is memtest86+.iso
> 2. inject SRAR UE to VM memory and wait #MC
> When VM trigger #MC, I expect that qemu will receive SIGBUS signal ASAP, 
> and with the modifed qemu, I will kill VM.
> 
> In this case, do_machine_check() maybe called by kvm_machine_check() in 
> vmx.c.
> 
> Before [1], memory_failure() is called in do_machine_check(), so 
> TIF_SIGPENDING is set on due to SIGBUS signal, vcpu_run() checked the 
> pending singal, so return to qemu to handle SIGBUS.
> 
> After [1], do_machine_check() only add task work but not send SIGBUS 
> directly, vcpu_run() will not break the for-loop because 
> vcpu_enter_guest() return 1 and not set TIF_SIGPENDING on, task works 
> never executed until sth else happen. So the kvm enter guest repeatedly 
> and the #MC is triggered repeatedly.
> 

Sorry for my incorrect description.

I figure out that my test kernel is not the lastest, it's without [2] 
commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work 
function"), so vcpu_run() only care about signal_pending but not 
TIF_NOTIFY_RESUME which set on in task_work_add().

After [2], #MC flood should not exist.

Also thank Thomas Gleixner.

> Can you consider to fix cases like this?
> 
> And do you mind to give me some advice for my temporary workaround about 
> this #MC flood:
> I want to check the context of do_machine_check() is exception or kvm, 
> and fallback to call kill_me_xxx directly when in kvm context. (I 
> already tested simply and met my expection)
> 

So ignore my ask, please.

> [1]: commit 5567d11c21a1 ("x86/mce: Send #MC singal from task work")


-- 
Thanks,
- Ding Hui

      reply	other threads:[~2021-07-07  9:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06 12:16 [PATCH v2] x86/mce: Fix endless loop when run task works after #MC Ding Hui
2021-07-06 16:44 ` Luck, Tony
2021-07-07  3:39   ` Ding Hui
2021-07-07  9:51     ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fffec03b-2601-a0c0-5954-ee05fe046ba1@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=hpa@zytor.com \
    --cc=huangcun@sangfor.com.cn \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).