All of lore.kernel.org
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Tony Luck <tony.luck@intel.com>, Aili Yao <yaoaili@kingsoft.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Andy Lutomirski <luto@kernel.org>, Jue Wang <juew@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address
Date: Fri, 23 Apr 2021 02:18:34 +0000	[thread overview]
Message-ID: <20210423021833.GB68967@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210422170213.GE7021@zn.tnic>

On Thu, Apr 22, 2021 at 07:02:13PM +0200, Borislav Petkov wrote:
> On Wed, Apr 21, 2021 at 09:57:28AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> > 
> > The previous patch solves the infinite MCE loop issue when multiple
> 
> "previous patch" has no meaning when it is in git.
> 
> > MCE events races.  The remaining issue is to make sure that all threads
> 
> 	    "race."
> 
> > processing Action Required MCEs send to the current processes the
> 
> s/the //

I'll fix these grammar errors.

> 
> > SIGBUS with the proper virtual address and the error size.
> > 
> > This patch suggests to do page table walk to find the error virtual
> 
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
> 
> Also, do
> 
> $ git grep 'This patch' Documentation/process
> 
> for more details.

I didn't know the following rule:

    Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
    instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
    to do frotz", as if you are giving orders to the codebase to change
    its behaviour.

I'll follow this in my future post.

> 
> > address.  If we find multiple virtual addresses in walking, we now can't
> 
> Who's "we"?				during the pagetable walk

I wrongly abused rhetorical "we". I'll change this sentence in passive form.

> 
> > determine which one is correct, so we fall back to sending SIGBUS in
> > kill_me_maybe() without error info as we do now.  This corner case needs
> > to be solved in the future.
> 
> Solved how?

I don't know exactly.  MCE subsystem seems to have code extracting linear
address, so I wonder that that could be used as a hint to memory_failure()
to find the proper virtual address.

> If you can't map which error comes from which process, you
> can't do anything here. You could send SIGBUS to all but you might
> injure some innocent bystanders this way.

The situation in question is caused by action required MCE, so
we know which process we should send SIGBUS to. So if we choose
to send SIGBUS to all, no innocent bystanders would be affected.
But when the process have multiple virtual addresses associated
with the error physical address, the process receives multiple
SIGBUSs and all but one have wrong value in si_addr in siginfo_t,
so that's confusing.

> 
> Just code structuring suggestions below - mm stuff is for someone else
> to review properly.

Thank you, I'll update with them.

- Naoya Horiguchi

> 
> > +static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr,
> > +			      unsigned long end, struct mm_walk *walk)
> > +{
> > +	struct hwp_walk *hwp = (struct hwp_walk *)walk->private;
> > +	int ret = 0;
> > +	pte_t *ptep;
> > +	spinlock_t *ptl;
> > +
> > +	ptl = pmd_trans_huge_lock(pmdp, walk->vma);
> > +	if (ptl) {
> 
> Save yourself an indentation level:
> 
> 	if (!ptl)
> 		goto unlock;
> 
> > +		pmd_t pmd = *pmdp;
> > +
> > +		if (pmd_present(pmd)) {
> 
> ... ditto...
> 
> > +			unsigned long pfn = pmd_pfn(pmd);
> > +
> > +			if (pfn <= hwp->pfn && hwp->pfn < pfn + HPAGE_PMD_NR) {
> > +				unsigned long hwpoison_vaddr = addr +
> > +					((hwp->pfn - pfn) << PAGE_SHIFT);
> 
> ... which will allow you to not break those.
> 
> > +
> > +				ret = set_to_kill(&hwp->tk, hwpoison_vaddr,
> > +						  PAGE_SHIFT);
> > +			}
> > +		}
> > +		spin_unlock(ptl);
> > +		goto out;
> > +	}
> > +
> > +	if (pmd_trans_unstable(pmdp))
> > +		goto out;
> > +
> > +	ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl);
> > +	for (; addr != end; ptep++, addr += PAGE_SIZE) {
> > +		ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT,
> > +					     hwp->pfn, &hwp->tk);
> > +		if (ret == 1)
> > +			break;
> > +	}
> > +	pte_unmap_unlock(ptep - 1, ptl);
> > +out:
> > +	cond_resched();
> > +	return ret;
> > +}
> 
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

  reply	other threads:[~2021-04-23  2:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21  0:57 [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-21  0:57 ` [PATCH v3 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Naoya Horiguchi
2021-04-22 17:01   ` Borislav Petkov
2021-04-21  0:57 ` [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already Naoya Horiguchi
2021-04-22 17:02   ` Borislav Petkov
2021-04-23  2:17     ` HORIGUCHI NAOYA(堀口 直也)
2021-04-21  0:57 ` [PATCH v3 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-22 17:02   ` Borislav Petkov
2021-04-23  2:18     ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-04-23 11:57       ` Borislav Petkov
2021-04-26  8:23         ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07  2:10 ` [PATCH v3 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Luck, Tony
2021-05-07  3:37   ` Luck, Tony
2021-05-07  5:24     ` HORIGUCHI NAOYA(堀口 直也)
2021-05-07 11:14       ` Aili Yao
2021-05-07 18:02         ` Luck, Tony
2021-05-08  2:38           ` Aili Yao
2021-05-10  3:28             ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210423021833.GB68967@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.