From: Matthew Wilcox <firstname.lastname@example.org> To: "Luck, Tony" <email@example.com> Cc: Borislav Petkov <firstname.lastname@example.org>, Naoya Horiguchi <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, "Darrick J. Wong" <email@example.com>, Jane Chu <firstname.lastname@example.org> Subject: Re: [RFC] Make the memory failure blast radius more precise Date: Tue, 23 Jun 2020 23:40:27 +0100 Message-ID: <20200623224027.GI21350@casper.infradead.org> (raw) In-Reply-To: <20200623222658.GA21817@agluck-desk2.amr.corp.intel.com> On Tue, Jun 23, 2020 at 03:26:58PM -0700, Luck, Tony wrote: > On Tue, Jun 23, 2020 at 11:17:41PM +0100, Matthew Wilcox wrote: > > It might also be nice to have an madvise() MADV_ZERO option so the > > application doesn't have to look up the fd associated with that memory > > range, but we haven't floated that idea with the customer yet; I just > > thought of it now. > > So the conversation between OS and kernel goes like this? > > 1) machine check > 2) Kernel unmaps the 4K page surroundinng the poison and sends > SIGBUS to the application to say that one cache line is gone > 3) App says madvise(MADV_ZERO, that cache line) > 4) Kernel says ... "oh, you know how to deal with this" and allocates > a new page, copying the 63 good cache lines from the old page and > zeroing the missing one. New page is mapped to user. That could be one way of implementing it. My understanding is that pmem devices will reallocate bad cachelines on writes, so a better implementation would be: 1) Kernel receives machine check 2) Kernel sends SIGBUS to the application 3) App send madvise(MADV_ZERO, addr, 1 << granularity) 4) Kernel does special writes to ensure the cacheline is zeroed 5) App does whatever it needs to recover (reconstructs the data or marks it as gone) > Do you have folks lined up to use that? I don't know that many > folks are even catching the SIGBUS :-( Had a 75 minute meeting with some people who want to use pmem this afternoon ...
next prev parent reply index Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-23 20:17 Matthew Wilcox 2020-06-23 21:48 ` Dan Williams 2020-06-23 22:04 ` Luck, Tony 2020-06-23 22:17 ` Matthew Wilcox 2020-06-23 22:26 ` Luck, Tony 2020-06-23 22:40 ` Matthew Wilcox [this message] 2020-06-24 0:01 ` Darrick J. Wong 2020-06-24 12:10 ` Matthew Wilcox 2020-06-24 23:21 ` Dan Williams 2020-06-25 0:17 ` Matthew Wilcox 2020-06-25 1:18 ` Dan Williams 2020-06-24 21:22 ` Jane Chu 2020-06-25 0:13 ` Luck, Tony 2020-06-25 16:23 ` Jane Chu 2020-06-24 4:32 ` David Rientjes 2020-06-24 20:57 ` Jane Chu 2020-06-24 22:01 ` David Rientjes 2020-06-25 2:16 ` HORIGUCHI NAOYA(堀口 直也)
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200623224027.GI21350@casper.infradead.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-EDAC Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-edac/0 linux-edac/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-edac linux-edac/ https://lore.kernel.org/linux-edac \ firstname.lastname@example.org public-inbox-index linux-edac Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-edac AGPL code for this site: git clone https://public-inbox.org/public-inbox.git