From: David Laight <David.Laight@ACULAB.COM> To: 'Linus Torvalds' <torvalds@linux-foundation.org>, Dan Williams <dan.j.williams@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com>, Andy Lutomirski <luto@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "Peter Zijlstra <peterz@infradead.org>, Borislav Petkov <bp@alien8.de>, stable" <stable@vger.kernel.org>, the arch/x86 maintainers <x86@kernel.org>, "H. Peter Anvin <hpa@zytor.com>, Paul Mackerras <paulus@samba.org>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Erwin Tsaur" <erwin.tsaur@intel.com>, Michael Ellerman <mpe@ellerman.id.au>, "Arnaldo Carvalho de Melo <acme@kernel.org>, linux-nvdimm" <linux-nvdimm@lists.01.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: RE: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe() Date: Sun, 3 May 2020 12:57:30 +0000 [thread overview] Message-ID: <a4aabe6f2ca649779a772a5f0365af6f@AcuMS.aculab.com> (raw) In-Reply-To: <CAHk-=wiPkwF2+y6wZd=VD9BooKxHRWhSVW8dr+WSeeSPkJk7kQ@mail.gmail.com> From: Linus Torvalds > Sent: 01 May 2020 19:29 ... > And as DavidL pointed out - if you ever have "iomem" as a source or > destination, you need yet another case. Not because they can take > another kind of fault (although on some platforms you have the machine > checks for that too), but because they have *very* different > performance profiles (and the ERMS "rep movsb" sucks baby donkeys > through a straw). I was actually thinking that the nvdimm accesses need to be treated much more like (cached) memory mapped io space than normal system memory. So treating them the same as "iomem" and then having access functions that report access failures (which the current readq() doesn't) might make sense. If you are using memory that 'might fail' for kernel code or data you really get what you deserve. OTOH system response to PCIe errors is currently rather problematic. Mostly reads time out and return ~0u. This can be checked for and, if possibly valid, a second location read. However we have a x86 server box (I've forgotten whether it is HP or Dell) that generates an NMI whenever a PCIe link goes down. (The 'platform' takes the AER interrupt and uses an NMI to pass it to the kernel - whose bright idea was it to use an NMI???) This happens even after we've done an 'echo 1 >remove'. The system is supposed to be NEBS (I think that is the term) compliant which is supposed to be suitable for telephony work (including emergency calls), but any PCIe failure crashes the box! I've another system here that sometimes fails to bring the PCIe link back up. I guess these code paths don't get regular testing. In my case the PCIe slave is an fpga, reloading the fpga image (either over JTAG or after rewriting eeprom) doesn't always work. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: David Laight <David.Laight@ACULAB.COM> To: 'Linus Torvalds' <torvalds@linux-foundation.org>, Dan Williams <dan.j.williams@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com>, Andy Lutomirski <luto@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>, Borislav Petkov <bp@alien8.de>, stable <stable@vger.kernel.org>, the arch/x86 maintainers <x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, Paul Mackerras <paulus@samba.org>, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>, Erwin Tsaur <erwin.tsaur@intel.com>, Michael Ellerman <mpe@ellerman.id.au>, "Arnaldo Carvalho de Melo" <acme@kernel.org>, linux-nvdimm <linux-nvdimm@lists.01.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: RE: [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe() Date: Sun, 3 May 2020 12:57:30 +0000 [thread overview] Message-ID: <a4aabe6f2ca649779a772a5f0365af6f@AcuMS.aculab.com> (raw) In-Reply-To: <CAHk-=wiPkwF2+y6wZd=VD9BooKxHRWhSVW8dr+WSeeSPkJk7kQ@mail.gmail.com> From: Linus Torvalds > Sent: 01 May 2020 19:29 ... > And as DavidL pointed out - if you ever have "iomem" as a source or > destination, you need yet another case. Not because they can take > another kind of fault (although on some platforms you have the machine > checks for that too), but because they have *very* different > performance profiles (and the ERMS "rep movsb" sucks baby donkeys > through a straw). I was actually thinking that the nvdimm accesses need to be treated much more like (cached) memory mapped io space than normal system memory. So treating them the same as "iomem" and then having access functions that report access failures (which the current readq() doesn't) might make sense. If you are using memory that 'might fail' for kernel code or data you really get what you deserve. OTOH system response to PCIe errors is currently rather problematic. Mostly reads time out and return ~0u. This can be checked for and, if possibly valid, a second location read. However we have a x86 server box (I've forgotten whether it is HP or Dell) that generates an NMI whenever a PCIe link goes down. (The 'platform' takes the AER interrupt and uses an NMI to pass it to the kernel - whose bright idea was it to use an NMI???) This happens even after we've done an 'echo 1 >remove'. The system is supposed to be NEBS (I think that is the term) compliant which is supposed to be suitable for telephony work (including emergency calls), but any PCIe failure crashes the box! I've another system here that sometimes fails to bring the PCIe link back up. I guess these code paths don't get regular testing. In my case the PCIe slave is an fpga, reloading the fpga image (either over JTAG or after rewriting eeprom) doesn't always work. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2020-05-03 12:57 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-30 8:24 [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe() Dan Williams 2020-04-30 8:24 ` Dan Williams 2020-04-30 8:25 ` [PATCH v2 1/2] copy_safe: Rename memcpy_mcsafe() to copy_safe() Dan Williams 2020-04-30 8:25 ` Dan Williams 2020-05-01 2:55 ` Sasha Levin 2020-04-30 8:25 ` [PATCH v2 2/2] x86/copy_safe: Introduce copy_safe_fast() Dan Williams 2020-04-30 8:25 ` Dan Williams 2020-05-01 2:55 ` Sasha Levin 2020-04-30 14:02 ` [PATCH v2 0/2] Replace and improve "mcsafe" with copy_safe() Linus Torvalds 2020-04-30 14:02 ` Linus Torvalds 2020-04-30 16:51 ` Andy Lutomirski 2020-04-30 16:51 ` Andy Lutomirski 2020-04-30 17:17 ` Linus Torvalds 2020-04-30 17:17 ` Linus Torvalds 2020-04-30 18:42 ` Andy Lutomirski 2020-04-30 18:42 ` Andy Lutomirski 2020-04-30 19:22 ` Luck, Tony 2020-04-30 19:22 ` Luck, Tony 2020-04-30 19:50 ` Linus Torvalds 2020-04-30 19:50 ` Linus Torvalds 2020-04-30 20:25 ` Luck, Tony 2020-04-30 20:25 ` Luck, Tony 2020-04-30 23:52 ` Dan Williams 2020-04-30 23:52 ` Dan Williams 2020-05-01 0:10 ` Linus Torvalds 2020-05-01 0:10 ` Linus Torvalds 2020-05-01 0:23 ` Andy Lutomirski 2020-05-01 0:23 ` Andy Lutomirski 2020-05-01 0:39 ` Linus Torvalds 2020-05-01 0:39 ` Linus Torvalds 2020-05-01 1:10 ` Andy Lutomirski 2020-05-01 1:10 ` Andy Lutomirski 2020-05-01 14:09 ` Luck, Tony 2020-05-01 14:09 ` Luck, Tony 2020-05-03 0:29 ` Andy Lutomirski 2020-05-03 0:29 ` Andy Lutomirski 2020-05-04 20:05 ` Luck, Tony 2020-05-04 20:05 ` Luck, Tony 2020-05-04 20:26 ` Andy Lutomirski 2020-05-04 20:26 ` Andy Lutomirski 2020-05-04 21:30 ` Dan Williams 2020-05-04 21:30 ` Dan Williams 2020-05-01 0:24 ` Linus Torvalds 2020-05-01 0:24 ` Linus Torvalds 2020-05-01 1:20 ` Andy Lutomirski 2020-05-01 1:20 ` Andy Lutomirski 2020-05-01 1:21 ` Dan Williams 2020-05-01 1:21 ` Dan Williams 2020-05-01 18:28 ` Linus Torvalds 2020-05-01 18:28 ` Linus Torvalds 2020-05-01 20:17 ` Dave Hansen 2020-05-01 20:17 ` Dave Hansen 2020-05-03 12:57 ` David Laight [this message] 2020-05-03 12:57 ` David Laight 2020-05-04 18:33 ` Dan Williams 2020-05-04 18:33 ` Dan Williams 2020-05-11 15:24 ` Vivek Goyal 2020-05-11 15:24 ` Vivek Goyal 2020-04-30 19:51 ` Dan Williams 2020-04-30 19:51 ` Dan Williams 2020-04-30 20:07 ` Andy Lutomirski 2020-04-30 20:07 ` Andy Lutomirski 2020-05-01 7:46 ` David Laight 2020-05-01 7:46 ` David Laight
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=a4aabe6f2ca649779a772a5f0365af6f@AcuMS.aculab.com \ --to=david.laight@aculab.com \ --cc=dan.j.williams@intel.com \ --cc=erwin.tsaur@intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=luto@kernel.org \ --cc=mingo@redhat.com \ --cc=mpe@ellerman.id.au \ --cc=stable@vger.kernel.org \ --cc=tglx@linutronix.de \ --cc=tony.luck@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.