From: Jane Chu <jane.chu@oracle.com> To: david@fromorbit.com, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, vishal.l.verma@intel.com, dave.jiang@intel.com, agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com, ira.weiny@intel.com, willy@infradead.org, vgoyal@redhat.com, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, x86@kernel.org Subject: [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page Date: Tue, 5 Apr 2022 13:47:44 -0600 [thread overview] Message-ID: <20220405194747.2386619-4-jane.chu@oracle.com> (raw) In-Reply-To: <20220405194747.2386619-1-jane.chu@oracle.com> The set_memory_uc() approach doesn't work well in all cases. For example, when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, let's mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a 'np' page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Signed-off-by: Jane Chu <jane.chu@oracle.com> --- arch/x86/kernel/cpu/mce/core.c | 6 +++--- arch/x86/mm/pat/set_memory.c | 18 ++++++------------ drivers/nvdimm/pmem.c | 31 +++++++------------------------ include/linux/set_memory.h | 4 ++-- 4 files changed, 18 insertions(+), 41 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 981496e6bc0e..fa67bb9d1afe 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -579,7 +579,7 @@ static int uc_decode_notifier(struct notifier_block *nb, unsigned long val, pfn = mce->addr >> PAGE_SHIFT; if (!memory_failure(pfn, 0)) { - set_mce_nospec(pfn, whole_page(mce)); + set_mce_nospec(pfn); mce->kflags |= MCE_HANDLED_UC; } @@ -1316,7 +1316,7 @@ static void kill_me_maybe(struct callback_head *cb) ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); if (!ret) { - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); sync_core(); return; } @@ -1342,7 +1342,7 @@ static void kill_me_never(struct callback_head *cb) p->mce_count = 0; pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr); if (!memory_failure(p->mce_addr >> PAGE_SHIFT, 0)) - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); } static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *)) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 93dde949f224..404ffcb3f2cb 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1926,13 +1926,8 @@ int set_memory_wb(unsigned long addr, int numpages) EXPORT_SYMBOL(set_memory_wb); #ifdef CONFIG_X86_64 -/* - * Prevent speculative access to the page by either unmapping - * it (if we do not require access to any part of the page) or - * marking it uncacheable (if we want to try to retrieve data - * from non-poisoned lines in the page). - */ -int set_mce_nospec(unsigned long pfn, bool unmap) +/* Prevent speculative access to a page by marking it not-present */ +int set_mce_nospec(unsigned long pfn) { unsigned long decoy_addr; int rc; @@ -1954,10 +1949,7 @@ int set_mce_nospec(unsigned long pfn, bool unmap) */ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63)); - if (unmap) - rc = set_memory_np(decoy_addr, 1); - else - rc = set_memory_uc(decoy_addr, 1); + rc = set_memory_np(decoy_addr, 1); if (rc) pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn); return rc; @@ -1966,7 +1958,9 @@ int set_mce_nospec(unsigned long pfn, bool unmap) /* Restore full speculative operation to the pfn. */ int clear_mce_nospec(unsigned long pfn) { - return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1); + unsigned long addr = (unsigned long) pfn_to_kaddr(pfn); + + return change_page_attr_set(&addr, 1, __pgprot(_PAGE_PRESENT), 0); } EXPORT_SYMBOL_GPL(clear_mce_nospec); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..30c71a68175b 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -158,36 +158,19 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, struct page *page, unsigned int page_off, sector_t sector, unsigned int len) { - blk_status_t rc = BLK_STS_OK; - bool bad_pmem = false; phys_addr_t pmem_off = sector * 512 + pmem->data_offset; void *pmem_addr = pmem->virt_addr + pmem_off; - if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) - bad_pmem = true; + if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) { + blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len); - /* - * Note that we write the data both before and after - * clearing poison. The write before clear poison - * handles situations where the latest written data is - * preserved and the clear poison operation simply marks - * the address range as valid without changing the data. - * In this case application software can assume that an - * interrupted write will either return the new good - * data or an error. - * - * However, if pmem_clear_poison() leaves the data in an - * indeterminate state we need to perform the write - * after clear poison. - */ + if (rc != BLK_STS_OK) + pr_warn_ratelimited("%s: failed to clear poison\n", __func__); + return rc; + } flush_dcache_page(page); write_pmem(pmem_addr, page, page_off, len); - if (unlikely(bad_pmem)) { - rc = pmem_clear_poison(pmem, pmem_off, len); - write_pmem(pmem_addr, page, page_off, len); - } - - return rc; + return BLK_STS_OK; } static void pmem_submit_bio(struct bio *bio) diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index d6263d7afb55..cde2d8687a7b 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -43,10 +43,10 @@ static inline bool can_set_direct_map(void) #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */ #ifdef CONFIG_X86_64 -int set_mce_nospec(unsigned long pfn, bool unmap); +int set_mce_nospec(unsigned long pfn); int clear_mce_nospec(unsigned long pfn); #else -static inline int set_mce_nospec(unsigned long pfn, bool unmap) +static inline int set_mce_nospec(unsigned long pfn) { return 0; } -- 2.18.4
WARNING: multiple messages have this Message-ID (diff)
From: Jane Chu <jane.chu@oracle.com> To: david@fromorbit.com, djwong@kernel.org, dan.j.williams@intel.com, hch@infradead.org, vishal.l.verma@intel.com, dave.jiang@intel.com, agk@redhat.com, snitzer@redhat.com, dm-devel@redhat.com, ira.weiny@intel.com, willy@infradead.org, vgoyal@redhat.com, linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, x86@kernel.org Subject: [dm-devel] [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page Date: Tue, 5 Apr 2022 13:47:44 -0600 [thread overview] Message-ID: <20220405194747.2386619-4-jane.chu@oracle.com> (raw) In-Reply-To: <20220405194747.2386619-1-jane.chu@oracle.com> The set_memory_uc() approach doesn't work well in all cases. For example, when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, let's mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a 'np' page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Signed-off-by: Jane Chu <jane.chu@oracle.com> --- arch/x86/kernel/cpu/mce/core.c | 6 +++--- arch/x86/mm/pat/set_memory.c | 18 ++++++------------ drivers/nvdimm/pmem.c | 31 +++++++------------------------ include/linux/set_memory.h | 4 ++-- 4 files changed, 18 insertions(+), 41 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 981496e6bc0e..fa67bb9d1afe 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -579,7 +579,7 @@ static int uc_decode_notifier(struct notifier_block *nb, unsigned long val, pfn = mce->addr >> PAGE_SHIFT; if (!memory_failure(pfn, 0)) { - set_mce_nospec(pfn, whole_page(mce)); + set_mce_nospec(pfn); mce->kflags |= MCE_HANDLED_UC; } @@ -1316,7 +1316,7 @@ static void kill_me_maybe(struct callback_head *cb) ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); if (!ret) { - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); sync_core(); return; } @@ -1342,7 +1342,7 @@ static void kill_me_never(struct callback_head *cb) p->mce_count = 0; pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr); if (!memory_failure(p->mce_addr >> PAGE_SHIFT, 0)) - set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); + set_mce_nospec(p->mce_addr >> PAGE_SHIFT); } static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *)) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 93dde949f224..404ffcb3f2cb 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1926,13 +1926,8 @@ int set_memory_wb(unsigned long addr, int numpages) EXPORT_SYMBOL(set_memory_wb); #ifdef CONFIG_X86_64 -/* - * Prevent speculative access to the page by either unmapping - * it (if we do not require access to any part of the page) or - * marking it uncacheable (if we want to try to retrieve data - * from non-poisoned lines in the page). - */ -int set_mce_nospec(unsigned long pfn, bool unmap) +/* Prevent speculative access to a page by marking it not-present */ +int set_mce_nospec(unsigned long pfn) { unsigned long decoy_addr; int rc; @@ -1954,10 +1949,7 @@ int set_mce_nospec(unsigned long pfn, bool unmap) */ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63)); - if (unmap) - rc = set_memory_np(decoy_addr, 1); - else - rc = set_memory_uc(decoy_addr, 1); + rc = set_memory_np(decoy_addr, 1); if (rc) pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn); return rc; @@ -1966,7 +1958,9 @@ int set_mce_nospec(unsigned long pfn, bool unmap) /* Restore full speculative operation to the pfn. */ int clear_mce_nospec(unsigned long pfn) { - return set_memory_wb((unsigned long) pfn_to_kaddr(pfn), 1); + unsigned long addr = (unsigned long) pfn_to_kaddr(pfn); + + return change_page_attr_set(&addr, 1, __pgprot(_PAGE_PRESENT), 0); } EXPORT_SYMBOL_GPL(clear_mce_nospec); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..30c71a68175b 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -158,36 +158,19 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, struct page *page, unsigned int page_off, sector_t sector, unsigned int len) { - blk_status_t rc = BLK_STS_OK; - bool bad_pmem = false; phys_addr_t pmem_off = sector * 512 + pmem->data_offset; void *pmem_addr = pmem->virt_addr + pmem_off; - if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) - bad_pmem = true; + if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) { + blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len); - /* - * Note that we write the data both before and after - * clearing poison. The write before clear poison - * handles situations where the latest written data is - * preserved and the clear poison operation simply marks - * the address range as valid without changing the data. - * In this case application software can assume that an - * interrupted write will either return the new good - * data or an error. - * - * However, if pmem_clear_poison() leaves the data in an - * indeterminate state we need to perform the write - * after clear poison. - */ + if (rc != BLK_STS_OK) + pr_warn_ratelimited("%s: failed to clear poison\n", __func__); + return rc; + } flush_dcache_page(page); write_pmem(pmem_addr, page, page_off, len); - if (unlikely(bad_pmem)) { - rc = pmem_clear_poison(pmem, pmem_off, len); - write_pmem(pmem_addr, page, page_off, len); - } - - return rc; + return BLK_STS_OK; } static void pmem_submit_bio(struct bio *bio) diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index d6263d7afb55..cde2d8687a7b 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -43,10 +43,10 @@ static inline bool can_set_direct_map(void) #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */ #ifdef CONFIG_X86_64 -int set_mce_nospec(unsigned long pfn, bool unmap); +int set_mce_nospec(unsigned long pfn); int clear_mce_nospec(unsigned long pfn); #else -static inline int set_mce_nospec(unsigned long pfn, bool unmap) +static inline int set_mce_nospec(unsigned long pfn) { return 0; } -- 2.18.4 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2022-04-05 19:48 UTC|newest] Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-04-05 19:47 [PATCH v7 0/6] DAX poison recovery Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-05 19:47 ` [PATCH v7 1/6] x86/mm: fix comment Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-11 22:07 ` Dan Williams 2022-04-11 22:07 ` [dm-devel] " Dan Williams 2022-04-12 9:53 ` Borislav Petkov 2022-04-12 9:53 ` [dm-devel] " Borislav Petkov 2022-04-14 1:00 ` Jane Chu 2022-04-14 1:00 ` [dm-devel] " Jane Chu 2022-04-14 8:44 ` Borislav Petkov 2022-04-14 8:44 ` [dm-devel] " Borislav Petkov 2022-04-14 21:54 ` Jane Chu 2022-04-14 21:54 ` [dm-devel] " Jane Chu 2022-04-05 19:47 ` [PATCH v7 2/6] x86/mce: relocate set{clear}_mce_nospec() functions Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-06 5:01 ` Christoph Hellwig 2022-04-06 5:01 ` [dm-devel] " Christoph Hellwig 2022-04-11 22:20 ` Dan Williams 2022-04-11 22:20 ` [dm-devel] " Dan Williams 2022-04-14 0:56 ` Jane Chu 2022-04-14 0:56 ` [dm-devel] " Jane Chu 2022-04-05 19:47 ` Jane Chu [this message] 2022-04-05 19:47 ` [dm-devel] [PATCH v7 3/6] mce: fix set_mce_nospec to always unmap the whole page Jane Chu 2022-04-06 5:02 ` Christoph Hellwig 2022-04-06 5:02 ` [dm-devel] " Christoph Hellwig 2022-04-11 23:27 ` Dan Williams 2022-04-11 23:27 ` [dm-devel] " Dan Williams 2022-04-13 23:36 ` Jane Chu 2022-04-13 23:36 ` [dm-devel] " Jane Chu 2022-04-14 2:32 ` Dan Williams 2022-04-14 2:32 ` [dm-devel] " Dan Williams 2022-04-15 16:18 ` Jane Chu 2022-04-15 16:18 ` [dm-devel] " Jane Chu 2022-04-12 10:07 ` Borislav Petkov 2022-04-12 10:07 ` [dm-devel] " Borislav Petkov 2022-04-13 23:41 ` Jane Chu 2022-04-13 23:41 ` [dm-devel] " Jane Chu 2022-04-05 19:47 ` [PATCH v7 4/6] dax: add DAX_RECOVERY flag and .recovery_write dev_pgmap_ops Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-06 5:19 ` Christoph Hellwig 2022-04-06 5:19 ` Christoph Hellwig 2022-04-06 17:32 ` [dm-devel] " Jane Chu 2022-04-06 17:32 ` Jane Chu 2022-04-06 17:45 ` Jane Chu 2022-04-06 17:45 ` [dm-devel] " Jane Chu 2022-04-07 5:30 ` Christoph Hellwig 2022-04-07 5:30 ` [dm-devel] " Christoph Hellwig 2022-04-11 23:55 ` Dan Williams 2022-04-11 23:55 ` [dm-devel] " Dan Williams 2022-04-14 0:48 ` Jane Chu 2022-04-14 0:48 ` [dm-devel] " Jane Chu 2022-04-14 0:47 ` Jane Chu 2022-04-14 0:47 ` [dm-devel] " Jane Chu 2022-04-12 0:08 ` Dan Williams 2022-04-12 0:08 ` [dm-devel] " Dan Williams 2022-04-14 0:50 ` Jane Chu 2022-04-14 0:50 ` [dm-devel] " Jane Chu 2022-04-12 4:57 ` Dan Williams 2022-04-12 4:57 ` [dm-devel] " Dan Williams 2022-04-12 5:02 ` Christoph Hellwig 2022-04-12 5:02 ` [dm-devel] " Christoph Hellwig 2022-04-14 0:51 ` Jane Chu 2022-04-14 0:51 ` [dm-devel] " Jane Chu 2022-04-05 19:47 ` [PATCH v7 5/6] pmem: refactor pmem_clear_poison() Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-06 5:04 ` Christoph Hellwig 2022-04-06 5:04 ` [dm-devel] " Christoph Hellwig 2022-04-06 17:34 ` Jane Chu 2022-04-06 17:34 ` Jane Chu 2022-04-12 4:26 ` Dan Williams 2022-04-12 4:26 ` [dm-devel] " Dan Williams 2022-04-14 0:55 ` Jane Chu 2022-04-14 0:55 ` [dm-devel] " Jane Chu 2022-04-14 2:02 ` Dan Williams 2022-04-14 2:02 ` [dm-devel] " Dan Williams 2022-04-05 19:47 ` [PATCH v7 6/6] pmem: implement pmem_recovery_write() Jane Chu 2022-04-05 19:47 ` [dm-devel] " Jane Chu 2022-04-06 5:21 ` Christoph Hellwig 2022-04-06 5:21 ` [dm-devel] " Christoph Hellwig 2022-04-06 17:33 ` Jane Chu 2022-04-06 17:33 ` Jane Chu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220405194747.2386619-4-jane.chu@oracle.com \ --to=jane.chu@oracle.com \ --cc=agk@redhat.com \ --cc=dan.j.williams@intel.com \ --cc=dave.jiang@intel.com \ --cc=david@fromorbit.com \ --cc=djwong@kernel.org \ --cc=dm-devel@redhat.com \ --cc=hch@infradead.org \ --cc=ira.weiny@intel.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-xfs@vger.kernel.org \ --cc=nvdimm@lists.linux.dev \ --cc=snitzer@redhat.com \ --cc=vgoyal@redhat.com \ --cc=vishal.l.verma@intel.com \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.