All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ding Hui <dinghui@sangfor.com.cn>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: bp@alien8.de, bp@suse.de, naoya.horiguchi@nec.com,
	osalvador@suse.de, peterz@infradead.org,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, mingo@redhat.com, x86@kernel.org,
	hpa@zytor.com, youquan.song@intel.com, huangcun@sangfor.com.cn,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] x86/mce: Fix endless loop when run task works after #MC
Date: Wed, 7 Jul 2021 17:51:20 +0800	[thread overview]
Message-ID: <fffec03b-2601-a0c0-5954-ee05fe046ba1@sangfor.com.cn> (raw)
In-Reply-To: <6a1b1371-50e4-f0f6-1ebd-0a91fc9d7bcc@sangfor.com.cn>

On 2021/7/7 11:39, Ding Hui wrote:
> On 2021/7/7 0:44, Luck, Tony wrote:
>> On Tue, Jul 06, 2021 at 08:16:06PM +0800, Ding Hui wrote:
>>> Recently we encounter multi #MC on the same task when it's
>>> task_work_run() has not been called, current->mce_kill_me was
>>> added to task_works list more than once, that make a circular
>>> linked task_works, so task_work_run() will do a endless loop.
>>
>> I saw the same and posted a similar fix a while back:
>>
>> https://www.spinics.net/lists/linux-mm/msg251006.html
>>
>> It didn't get merged because some validation tests began failing
>> around the same time.  I'm now pretty sure I understand what happened
>> with those other tests.
>>
>> I'll post my updated version (second patch in a three part series)
>> later today.
>>
> 
> Thanks for your fixes.
> 
> After digging my original problem, maybe I find out why I met #MC flood.
> 
> My test case:
> 1. run qemu-kvm guest VM, OS is memtest86+.iso
> 2. inject SRAR UE to VM memory and wait #MC
> When VM trigger #MC, I expect that qemu will receive SIGBUS signal ASAP, 
> and with the modifed qemu, I will kill VM.
> 
> In this case, do_machine_check() maybe called by kvm_machine_check() in 
> vmx.c.
> 
> Before [1], memory_failure() is called in do_machine_check(), so 
> TIF_SIGPENDING is set on due to SIGBUS signal, vcpu_run() checked the 
> pending singal, so return to qemu to handle SIGBUS.
> 
> After [1], do_machine_check() only add task work but not send SIGBUS 
> directly, vcpu_run() will not break the for-loop because 
> vcpu_enter_guest() return 1 and not set TIF_SIGPENDING on, task works 
> never executed until sth else happen. So the kvm enter guest repeatedly 
> and the #MC is triggered repeatedly.
> 

Sorry for my incorrect description.

I figure out that my test kernel is not the lastest, it's without [2] 
commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work 
function"), so vcpu_run() only care about signal_pending but not 
TIF_NOTIFY_RESUME which set on in task_work_add().

After [2], #MC flood should not exist.

Also thank Thomas Gleixner.

> Can you consider to fix cases like this?
> 
> And do you mind to give me some advice for my temporary workaround about 
> this #MC flood:
> I want to check the context of do_machine_check() is exception or kvm, 
> and fallback to call kill_me_xxx directly when in kvm context. (I 
> already tested simply and met my expection)
> 

So ignore my ask, please.

> [1]: commit 5567d11c21a1 ("x86/mce: Send #MC singal from task work")


-- 
Thanks,
- Ding Hui

      reply	other threads:[~2021-07-07  9:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06 12:16 [PATCH v2] x86/mce: Fix endless loop when run task works after #MC Ding Hui
2021-07-06 16:44 ` Luck, Tony
2021-07-07  3:39   ` Ding Hui
2021-07-07  9:51     ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fffec03b-2601-a0c0-5954-ee05fe046ba1@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=hpa@zytor.com \
    --cc=huangcun@sangfor.com.cn \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.