From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49791C1B0D9 for ; Mon, 7 Dec 2020 11:35:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CD01723340 for ; Mon, 7 Dec 2020 11:35:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CD01723340 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 64D608D001E; Mon, 7 Dec 2020 06:35:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FF8B8D0001; Mon, 7 Dec 2020 06:35:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 514E38D001E; Mon, 7 Dec 2020 06:35:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 3BF068D0001 for ; Mon, 7 Dec 2020 06:35:27 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id ED72433CD for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) X-FDA: 77566280652.17.van73_2001ba2273de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id CE439180D0181 for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) X-HE-Tag: van73_2001ba2273de X-Filterd-Recvd-Size: 10468 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 11:35:26 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id lb18so4650815pjb.5 for ; Mon, 07 Dec 2020 03:35:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RqR44fauEeBZpjBAC1kdf97KbTscq65KeJ9aEDRCpGI=; b=Rw3amSd3rBlJpXc7dX1ne8f178KphM/6QUoRTeExFUzM83n8/SHLhQ1jf5VnNMZZoq WuAjjkZMPHqpQpdQvXvFrk1Lrq3tYST8SBgJNrKuHm5OX6smgd6BEFs4OIMr7YERjyHE +l8qRaSkfTcRGuTg4cWZJ7rhUL/vn+7/MfHju29iuvNwlvUzvEEXuWed5lRO12cXlsAx XrCYPpBkWM0g4QcxIQmTZCq99AtRHUV0xUPFWj3Jr2NdsV9RrOanCNVSLbsuvivzchZf +SeAeVSMNE20dUY9LBSn+wbwDwf7toSjY/933D0g8CyRml2C5K4r/casmfAA+xjS1HuQ ICHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RqR44fauEeBZpjBAC1kdf97KbTscq65KeJ9aEDRCpGI=; b=X9wETdUhlxqNHmrmf1oPUuPQACQej+ic5rZHHLHiQ/j+ZPv7XaTAotj/0nSBwRp2Vv a3dYN7BWtOCjUXSXXMZ05xhyhvj2D1XHEV1lRyFx6iE8Z9Xp8KfPMU5KPa6spnC5g1WH QeXJGmYXP4zDo1XM75NMPNcbG8znUqCy3ArjAhtgZ3B/W2fVnunHkYtsb7zpznRWSW0o 723KRTfnBWTCnQjCZQWSgBqnZOfx93+4ICaDUaFikMjQk/hMTDSFvRBJ+UD+u0Fz48B2 MQ2zdKRchmY+7vILzGxo0ygpp5tZMqnislG8fEb3az0fqZCQOXWNzhln86NxNNSJEFDF c8Hw== X-Gm-Message-State: AOAM532ucD5doAFR/xXjB/VUeWDZ8LOEC7VqoyPxpLVWE/Qkyh166ymy 4nowLR6cIehW2VovyE8ueL0pTEiZCzk= X-Google-Smtp-Source: ABdhPJyk3zQTpnyvz7bT7T3Uf0mnoGZKCzyksipWfZnA8cr+UkGx6yntGVpICOae5nxRYw96Y20SUg== X-Received: by 2002:a17:90b:11d5:: with SMTP id gv21mr930902pjb.12.1607340925289; Mon, 07 Dec 2020 03:35:25 -0800 (PST) Received: from localhost.localdomain ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id d4sm14219822pfo.127.2020.12.07.03.35.22 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Dec 2020 03:35:24 -0800 (PST) From: yulei.kernel@gmail.com X-Google-Original-From: yuleixzhang@tencent.com To: linux-mm@kvack.org, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, naoya.horiguchi@nec.com, viro@zeniv.linux.org.uk, pbonzini@redhat.com Cc: joao.m.martins@oracle.com, rdunlap@infradead.org, sean.j.christopherson@intel.com, xiaoguangrong.eric@gmail.com, kernellwp@gmail.com, lihaiwei.kernel@gmail.com, Yulei Zhang , Haiwei Li Subject: [RFC V2 31/37] dmem: introduce mce handler Date: Mon, 7 Dec 2020 19:31:24 +0800 Message-Id: <6a5471107b81ee999f776547f2fccb045967701e.1607332046.git.yuleixzhang@tencent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yulei Zhang dmem handle the mce if the pfn belongs to dmem when mce occurs. 1. check whether the pfn is handled by dmem. return if true. 2. mark the pfn in a new error bitmap defined in page. 3. a series of mechanism to ensure that the mce pfn is not allocated. Signed-off-by: Haiwei Li Signed-off-by: Yulei Zhang --- include/linux/dmem.h | 6 +++ include/trace/events/dmem.h | 17 ++++++++ mm/dmem.c | 103 +++++++++++++++++++++++++++++++-------= ------ mm/memory-failure.c | 6 +++ 4 files changed, 102 insertions(+), 30 deletions(-) diff --git a/include/linux/dmem.h b/include/linux/dmem.h index 59d3ef14..cd17a91 100644 --- a/include/linux/dmem.h +++ b/include/linux/dmem.h @@ -21,6 +21,8 @@ void dmem_free_pages(phys_addr_t addr, unsigned int dpages_nr); bool is_dmem_pfn(unsigned long pfn); #define dmem_free_page(addr) dmem_free_pages(addr, 1) + +bool dmem_memory_failure(unsigned long pfn, int flags); #else static inline int dmem_reserve_init(void) { @@ -32,5 +34,9 @@ static inline bool is_dmem_pfn(unsigned long pfn) return 0; } =20 +static inline bool dmem_memory_failure(unsigned long pfn, int flags) +{ + return false; +} #endif #endif /* _LINUX_DMEM_H */ diff --git a/include/trace/events/dmem.h b/include/trace/events/dmem.h index 10d1b90..f8eeb3c 100644 --- a/include/trace/events/dmem.h +++ b/include/trace/events/dmem.h @@ -62,6 +62,23 @@ TP_printk("addr %#lx dpages_nr %d", (unsigned long)__entry->addr, __entry->dpages_nr) ); + +TRACE_EVENT(dmem_memory_failure, + TP_PROTO(unsigned long pfn, bool used), + TP_ARGS(pfn, used), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(bool, used) + ), + + TP_fast_assign( + __entry->pfn =3D pfn; + __entry->used =3D used; + ), + + TP_printk("pfn=3D%#lx used=3D%d", __entry->pfn, __entry->used) +); #endif =20 /* This part must be outside protection */ diff --git a/mm/dmem.c b/mm/dmem.c index 50cdff9..16438db 100644 --- a/mm/dmem.c +++ b/mm/dmem.c @@ -431,6 +431,41 @@ static void __init dmem_uinit(void) dmem_pool.registered_pages =3D 0; } =20 +/* set or clear corresponding bit on allocation bitmap based on error bi= tmap */ +static unsigned long dregion_alloc_bitmap_set_clear(struct dmem_region *= dregion, + bool set) +{ + unsigned long pos_pfn, pos_offset; + unsigned long valid_pages, mce_dpages =3D 0; + phys_addr_t dpage, reserved_start_pfn; + + reserved_start_pfn =3D __phys_to_pfn(dregion->reserved_start_addr); + + valid_pages =3D dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_p= fn; + pos_offset =3D dpage_to_pfn(dregion->dpage_start_pfn) + - reserved_start_pfn; +try_set: + pos_pfn =3D find_next_bit(dregion->error_bitmap, valid_pages, pos_offse= t); + + if (pos_pfn >=3D valid_pages) + return mce_dpages; + mce_dpages++; + dpage =3D pfn_to_dpage(pos_pfn + reserved_start_pfn); + if (set) + WARN_ON(__test_and_set_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + else + WARN_ON(!__test_and_clear_bit(dpage - dregion->dpage_start_pfn, + dregion->bitmap)); + pos_offset =3D dpage_to_pfn(dpage + 1) - reserved_start_pfn; + goto try_set; +} + +static unsigned long dmem_region_mark_mce_dpages(struct dmem_region *dre= gion) +{ + return dregion_alloc_bitmap_set_clear(dregion, true); +} + static int __init dmem_region_init(struct dmem_region *dregion) { unsigned long *bitmap, nr_pages; @@ -514,6 +549,8 @@ static int dmem_alloc_region_init(struct dmem_region = *dregion, dregion->dpage_start_pfn =3D start; dregion->dpage_end_pfn =3D end; =20 + *dpages -=3D dmem_region_mark_mce_dpages(dregion); + dmem_pool.unaligned_pages +=3D __phys_to_pfn((dpage_to_phys(start) - dregion->reserved_start_addr)); dmem_pool.unaligned_pages +=3D __phys_to_pfn(dregion->reserved_end_addr @@ -558,36 +595,6 @@ static bool dmem_dpage_is_error(struct dmem_region *= dregion, phys_addr_t dpage) return err_num; } =20 -/* set or clear corresponding bit on allocation bitmap based on error bi= tmap */ -static unsigned long dregion_alloc_bitmap_set_clear(struct dmem_region *= dregion, - bool set) -{ - unsigned long pos_pfn, pos_offset; - unsigned long valid_pages, mce_dpages =3D 0; - phys_addr_t dpage, reserved_start_pfn; - - reserved_start_pfn =3D __phys_to_pfn(dregion->reserved_start_addr); - - valid_pages =3D dpage_to_pfn(dregion->dpage_end_pfn) - reserved_start_p= fn; - pos_offset =3D dpage_to_pfn(dregion->dpage_start_pfn) - - reserved_start_pfn; -try_set: - pos_pfn =3D find_next_bit(dregion->error_bitmap, valid_pages, pos_offse= t); - - if (pos_pfn >=3D valid_pages) - return mce_dpages; - mce_dpages++; - dpage =3D pfn_to_dpage(pos_pfn + reserved_start_pfn); - if (set) - WARN_ON(__test_and_set_bit(dpage - dregion->dpage_start_pfn, - dregion->bitmap)); - else - WARN_ON(!__test_and_clear_bit(dpage - dregion->dpage_start_pfn, - dregion->bitmap)); - pos_offset =3D dpage_to_pfn(dpage + 1) - reserved_start_pfn; - goto try_set; -} - static void dmem_uinit_check_alloc_bitmap(struct dmem_region *dregion) { unsigned long dpages, size; @@ -989,6 +996,42 @@ void dmem_free_pages(phys_addr_t addr, unsigned int = dpages_nr) } EXPORT_SYMBOL(dmem_free_pages); =20 +bool dmem_memory_failure(unsigned long pfn, int flags) +{ + struct dmem_region *dregion; + struct dmem_node *pdnode =3D NULL; + u64 pos; + phys_addr_t addr =3D __pfn_to_phys(pfn); + bool used =3D false; + + dregion =3D find_dmem_region(addr, &pdnode); + if (!dregion) + return false; + + WARN_ON(!pdnode || !dregion->error_bitmap); + + mutex_lock(&dmem_pool.lock); + pos =3D pfn - __phys_to_pfn(dregion->reserved_start_addr); + if (__test_and_set_bit(pos, dregion->error_bitmap)) + goto out; + + if (!dregion->bitmap || pfn < dpage_to_pfn(dregion->dpage_start_pfn) || + pfn >=3D dpage_to_pfn(dregion->dpage_end_pfn)) + goto out; + + pos =3D phys_to_dpage(addr) - dregion->dpage_start_pfn; + if (__test_and_set_bit(pos, dregion->bitmap)) { + used =3D true; + } else { + pr_info("MCE: free dpage, mark %#lx disabled in dmem\n", pfn); + dnode_count_free_dpages(pdnode, -1); + } +out: + trace_dmem_memory_failure(pfn, used); + mutex_unlock(&dmem_pool.lock); + return true; +} + bool is_dmem_pfn(unsigned long pfn) { struct dmem_node *dnode; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5d880d4..dda45d2 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -35,6 +35,7 @@ */ #include #include +#include #include #include #include @@ -1323,6 +1324,11 @@ int memory_failure(unsigned long pfn, int flags) if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); =20 + if (dmem_memory_failure(pfn, flags)) { + pr_info("MCE %#lx: handled by dmem\n", pfn); + return 0; + } + p =3D pfn_to_online_page(pfn); if (!p) { if (pfn_valid(pfn)) { --=20 1.8.3.1