* [RESEND 0/3] Add support for memcpy_mcsafe @ 2018-04-04 23:19 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: mpe, npiggin, linux-nvdimm memcpy_mcsafe() is an API currently used by the pmem subsystem to convert errors while doing a memcpy (machine check exception errors) to a return value. This patchset consists of three patches 1. The first patch is a bug fix to handle machine check errors correctly while walking the page tables in kernel mode, due to huge pmd/pud sizes 2. The second patch adds memcpy_mcsafe() support, this is largely derived from existing code 3. The third patch registers for callbacks on machine check exceptions and in them uses specialized knowledge of the type of page to decide whether to handle the MCE as is or to return to a fixup address present in memcpy_mcsafe(). If a fixup address is used, then we return an error value of -EFAULT to the caller. Testing A large part of the testing was done under a simulator by selectively inserting machine check exceptions in a test driver doing memcpy_mcsafe via ioctls. Balbir Singh (3): powerpc/mce: Bug fixes for MCE handling in kernel space powerpc/memcpy: Add memcpy_mcsafe for pmem powerpc/mce: Handle memcpy_mcsafe arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/kernel/mce.c | 76 +++++++++++- arch/powerpc/kernel/mce_power.c | 17 +-- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 225 ++++++++++++++++++++++++++++++++++++ 6 files changed, 314 insertions(+), 11 deletions(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S -- 2.13.6 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RESEND 0/3] Add support for memcpy_mcsafe @ 2018-04-04 23:19 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: linux-nvdimm, mpe, npiggin, Balbir Singh memcpy_mcsafe() is an API currently used by the pmem subsystem to convert errors while doing a memcpy (machine check exception errors) to a return value. This patchset consists of three patches 1. The first patch is a bug fix to handle machine check errors correctly while walking the page tables in kernel mode, due to huge pmd/pud sizes 2. The second patch adds memcpy_mcsafe() support, this is largely derived from existing code 3. The third patch registers for callbacks on machine check exceptions and in them uses specialized knowledge of the type of page to decide whether to handle the MCE as is or to return to a fixup address present in memcpy_mcsafe(). If a fixup address is used, then we return an error value of -EFAULT to the caller. Testing A large part of the testing was done under a simulator by selectively inserting machine check exceptions in a test driver doing memcpy_mcsafe via ioctls. Balbir Singh (3): powerpc/mce: Bug fixes for MCE handling in kernel space powerpc/memcpy: Add memcpy_mcsafe for pmem powerpc/mce: Handle memcpy_mcsafe arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/kernel/mce.c | 76 +++++++++++- arch/powerpc/kernel/mce_power.c | 17 +-- arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 225 ++++++++++++++++++++++++++++++++++++ 6 files changed, 314 insertions(+), 11 deletions(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S -- 2.13.6 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space 2018-04-04 23:19 ` Balbir Singh @ 2018-04-04 23:19 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: mpe, npiggin, linux-nvdimm The code currently assumes PAGE_SHIFT as the shift value of the pfn, this works correctly (mostly) for user space pages, but the correct thing to do is 1. Extrace the shift value returned via the pte-walk API's 2. Use the shift value to access the instruction address. Note, the final physical address still use PAGE_SHIFT for computation. handle_ierror() is not modified and handle_derror() is modified just for extracting the correct instruction address. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/kernel/mce_power.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index fe6fc63251fe..69c8cc1e8e4f 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -36,7 +36,8 @@ * Convert an address related to an mm to a PFN. NOTE: we are in real * mode, we could potentially race with page table updates. */ -static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, + unsigned int *shift) { pte_t *ptep; unsigned long flags; @@ -49,9 +50,9 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) local_irq_save(flags); if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); else - ptep = find_init_mm_pte(addr, NULL); + ptep = find_init_mm_pte(addr, shift); local_irq_restore(flags); if (!ptep || pte_special(*ptep)) return ULONG_MAX; @@ -353,13 +354,14 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, unsigned long pfn, instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; + unsigned int shift; - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, &shift); if (pfn != ULONG_MAX) { - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); instr = *(unsigned int *)(instr_addr); if (!analyse_instr(&op, &tmp, instr)) { - pfn = addr_to_pfn(regs, op.ea); + pfn = addr_to_pfn(regs, op.ea, &shift); *addr = op.ea; *phys_addr = (pfn << PAGE_SHIFT); return 0; @@ -437,7 +439,8 @@ static int mce_handle_ierror(struct pt_regs *regs, unsigned long pfn; if (get_paca()->in_mce < MAX_MCE_DEPTH) { - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, + NULL); if (pfn != ULONG_MAX) { *phys_addr = (pfn << PAGE_SHIFT); -- 2.13.6 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space @ 2018-04-04 23:19 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: linux-nvdimm, mpe, npiggin, Balbir Singh The code currently assumes PAGE_SHIFT as the shift value of the pfn, this works correctly (mostly) for user space pages, but the correct thing to do is 1. Extrace the shift value returned via the pte-walk API's 2. Use the shift value to access the instruction address. Note, the final physical address still use PAGE_SHIFT for computation. handle_ierror() is not modified and handle_derror() is modified just for extracting the correct instruction address. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/kernel/mce_power.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index fe6fc63251fe..69c8cc1e8e4f 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -36,7 +36,8 @@ * Convert an address related to an mm to a PFN. NOTE: we are in real * mode, we could potentially race with page table updates. */ -static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, + unsigned int *shift) { pte_t *ptep; unsigned long flags; @@ -49,9 +50,9 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) local_irq_save(flags); if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); else - ptep = find_init_mm_pte(addr, NULL); + ptep = find_init_mm_pte(addr, shift); local_irq_restore(flags); if (!ptep || pte_special(*ptep)) return ULONG_MAX; @@ -353,13 +354,14 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, unsigned long pfn, instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; + unsigned int shift; - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, &shift); if (pfn != ULONG_MAX) { - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); instr = *(unsigned int *)(instr_addr); if (!analyse_instr(&op, &tmp, instr)) { - pfn = addr_to_pfn(regs, op.ea); + pfn = addr_to_pfn(regs, op.ea, &shift); *addr = op.ea; *phys_addr = (pfn << PAGE_SHIFT); return 0; @@ -437,7 +439,8 @@ static int mce_handle_ierror(struct pt_regs *regs, unsigned long pfn; if (get_paca()->in_mce < MAX_MCE_DEPTH) { - pfn = addr_to_pfn(regs, regs->nip); + pfn = addr_to_pfn(regs, regs->nip, + NULL); if (pfn != ULONG_MAX) { *phys_addr = (pfn << PAGE_SHIFT); -- 2.13.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space 2018-04-04 23:19 ` Balbir Singh @ 2018-04-04 23:49 ` Nicholas Piggin -1 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-04 23:49 UTC (permalink / raw) To: Balbir Singh; +Cc: mpe, linuxppc-dev, linux-nvdimm On Thu, 5 Apr 2018 09:19:41 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > The code currently assumes PAGE_SHIFT as the shift value of > the pfn, this works correctly (mostly) for user space pages, > but the correct thing to do is It would be good to actually explain the problem in the changelog. I would have thought pte_pfn returns a PAGE_SIZE based pfn value? > > 1. Extrace the shift value returned via the pte-walk API's ^^^ extract? > 2. Use the shift value to access the instruction address. > > Note, the final physical address still use PAGE_SHIFT for > computation. handle_ierror() is not modified and handle_derror() > is modified just for extracting the correct instruction > address. > > Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/kernel/mce_power.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c > index fe6fc63251fe..69c8cc1e8e4f 100644 > --- a/arch/powerpc/kernel/mce_power.c > +++ b/arch/powerpc/kernel/mce_power.c > @@ -36,7 +36,8 @@ > * Convert an address related to an mm to a PFN. NOTE: we are in real > * mode, we could potentially race with page table updates. > */ > -static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) > +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, > + unsigned int *shift) > { > pte_t *ptep; > unsigned long flags; > @@ -49,9 +50,9 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) > > local_irq_save(flags); > if (mm == current->mm) > - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); > + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); > else > - ptep = find_init_mm_pte(addr, NULL); > + ptep = find_init_mm_pte(addr, shift); > local_irq_restore(flags); > if (!ptep || pte_special(*ptep)) > return ULONG_MAX; > @@ -353,13 +354,14 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, > unsigned long pfn, instr_addr; > struct instruction_op op; > struct pt_regs tmp = *regs; > + unsigned int shift; > > - pfn = addr_to_pfn(regs, regs->nip); > + pfn = addr_to_pfn(regs, regs->nip, &shift); > if (pfn != ULONG_MAX) { > - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); > + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); > instr = *(unsigned int *)(instr_addr); > if (!analyse_instr(&op, &tmp, instr)) { > - pfn = addr_to_pfn(regs, op.ea); > + pfn = addr_to_pfn(regs, op.ea, &shift); > *addr = op.ea; > *phys_addr = (pfn << PAGE_SHIFT); > return 0; > @@ -437,7 +439,8 @@ static int mce_handle_ierror(struct pt_regs *regs, > unsigned long pfn; > > if (get_paca()->in_mce < MAX_MCE_DEPTH) { > - pfn = addr_to_pfn(regs, regs->nip); > + pfn = addr_to_pfn(regs, regs->nip, > + NULL); > if (pfn != ULONG_MAX) { > *phys_addr = > (pfn << PAGE_SHIFT); _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space @ 2018-04-04 23:49 ` Nicholas Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-04 23:49 UTC (permalink / raw) To: Balbir Singh; +Cc: linuxppc-dev, linux-nvdimm, mpe On Thu, 5 Apr 2018 09:19:41 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > The code currently assumes PAGE_SHIFT as the shift value of > the pfn, this works correctly (mostly) for user space pages, > but the correct thing to do is It would be good to actually explain the problem in the changelog. I would have thought pte_pfn returns a PAGE_SIZE based pfn value? > > 1. Extrace the shift value returned via the pte-walk API's ^^^ extract? > 2. Use the shift value to access the instruction address. > > Note, the final physical address still use PAGE_SHIFT for > computation. handle_ierror() is not modified and handle_derror() > is modified just for extracting the correct instruction > address. > > Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/kernel/mce_power.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c > index fe6fc63251fe..69c8cc1e8e4f 100644 > --- a/arch/powerpc/kernel/mce_power.c > +++ b/arch/powerpc/kernel/mce_power.c > @@ -36,7 +36,8 @@ > * Convert an address related to an mm to a PFN. NOTE: we are in real > * mode, we could potentially race with page table updates. > */ > -static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) > +static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr, > + unsigned int *shift) > { > pte_t *ptep; > unsigned long flags; > @@ -49,9 +50,9 @@ static unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) > > local_irq_save(flags); > if (mm == current->mm) > - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); > + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift); > else > - ptep = find_init_mm_pte(addr, NULL); > + ptep = find_init_mm_pte(addr, shift); > local_irq_restore(flags); > if (!ptep || pte_special(*ptep)) > return ULONG_MAX; > @@ -353,13 +354,14 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, > unsigned long pfn, instr_addr; > struct instruction_op op; > struct pt_regs tmp = *regs; > + unsigned int shift; > > - pfn = addr_to_pfn(regs, regs->nip); > + pfn = addr_to_pfn(regs, regs->nip, &shift); > if (pfn != ULONG_MAX) { > - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); > + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1)); > instr = *(unsigned int *)(instr_addr); > if (!analyse_instr(&op, &tmp, instr)) { > - pfn = addr_to_pfn(regs, op.ea); > + pfn = addr_to_pfn(regs, op.ea, &shift); > *addr = op.ea; > *phys_addr = (pfn << PAGE_SHIFT); > return 0; > @@ -437,7 +439,8 @@ static int mce_handle_ierror(struct pt_regs *regs, > unsigned long pfn; > > if (get_paca()->in_mce < MAX_MCE_DEPTH) { > - pfn = addr_to_pfn(regs, regs->nip); > + pfn = addr_to_pfn(regs, regs->nip, > + NULL); > if (pfn != ULONG_MAX) { > *phys_addr = > (pfn << PAGE_SHIFT); ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space 2018-04-04 23:49 ` Nicholas Piggin @ 2018-04-05 1:11 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-05 1:11 UTC (permalink / raw) To: Nicholas Piggin; +Cc: mpe, linuxppc-dev, linux-nvdimm On Thu, 5 Apr 2018 09:49:00 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 09:19:41 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > > > The code currently assumes PAGE_SHIFT as the shift value of > > the pfn, this works correctly (mostly) for user space pages, > > but the correct thing to do is > > It would be good to actually explain the problem in the > changelog. I would have thought pte_pfn returns a > PAGE_SIZE based pfn value? > The issue is hidden inside of hugepte_offset() as invoked by __find_linux_pte(). I will send a new version because the code needs to do << (shift - PAGE_SHIFT) for instruction address. > > > > 1. Extrace the shift value returned via the pte-walk API's > > ^^^ extract? Thanks, yes, typo! Balbir Singh. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space @ 2018-04-05 1:11 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-05 1:11 UTC (permalink / raw) To: Nicholas Piggin; +Cc: linuxppc-dev, linux-nvdimm, mpe On Thu, 5 Apr 2018 09:49:00 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 09:19:41 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > > > The code currently assumes PAGE_SHIFT as the shift value of > > the pfn, this works correctly (mostly) for user space pages, > > but the correct thing to do is > > It would be good to actually explain the problem in the > changelog. I would have thought pte_pfn returns a > PAGE_SIZE based pfn value? > The issue is hidden inside of hugepte_offset() as invoked by __find_linux_pte(). I will send a new version because the code needs to do << (shift - PAGE_SHIFT) for instruction address. > > > > 1. Extrace the shift value returned via the pte-walk API's > > ^^^ extract? Thanks, yes, typo! Balbir Singh. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-04 23:19 ` Balbir Singh @ 2018-04-04 23:19 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: mpe, npiggin, linux-nvdimm The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check excpetions into a return value on failure in case a machine check exception is encoutered during the memcpy. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 212 ++++++++++++++++++++++++++++++++++++ 3 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9b8cedf618f4..b7e872a64726 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); #ifdef CONFIG_PPC64 #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 +#define __HAVE_ARCH_MEMCPY_MCSAFE +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 3c29c9009bbf..048afee9f518 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -24,7 +24,7 @@ endif obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ - memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o obj64-$(CONFIG_SMP) += locks.o obj64-$(CONFIG_ALTIVEC) += vmx-helper.o diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S new file mode 100644 index 000000000000..e7eaa9b6cded --- /dev/null +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S @@ -0,0 +1,212 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com> + * Author - Balbir Singh <bsingharora@gmail.com> + */ +#include <asm/ppc_asm.h> +#include <asm/errno.h> + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addi r1,r1,STACKFRAMESIZE +.Ldo_err1: + li r3,-EFAULT + blr + + +_GLOBAL(memcpy_mcsafe) + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addi r4,r4,1 +err1; stb r0,0(r3) + addi r3,r3,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflr r0 + stdu r1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdi r6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r7,16(r4) +err2; ld r8,24(r4) +err2; ld r9,32(r4) +err2; ld r10,40(r4) +err2; ld r11,48(r4) +err2; ld r12,56(r4) +err2; ld r14,64(r4) +err2; ld r15,72(r4) +err2; ld r16,80(r4) +err2; ld r17,88(r4) +err2; ld r18,96(r4) +err2; ld r19,104(r4) +err2; ld r20,112(r4) +err2; ld r21,120(r4) + addi r4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r7,16(r3) +err2; std r8,24(r3) +err2; std r9,32(r3) +err2; std r10,40(r3) +err2; std r11,48(r3) +err2; std r12,56(r3) +err2; std r14,64(r3) +err2; std r15,72(r3) +err2; std r16,80(r3) +err2; std r17,88(r3) +err2; std r18,96(r3) +err2; std r19,104(r3) +err2; std r20,112(r3) +err2; std r21,120(r3) + addi r3,r3,128 + bdnz 4b + + clrldi r5,r5,(64-7) + + ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + ld r17,STK_REG(R17)(r1) + ld r18,STK_REG(R18)(r1) + ld r19,STK_REG(R19)(r1) + ld r20,STK_REG(R20)(r1) + ld r21,STK_REG(R21)(r1) + ld r22,STK_REG(R22)(r1) + addi r1,r1,STACKFRAMESIZE + + /* Up to 127B to go */ +5: srdi r6,r5,4 + mtocrf 0x01,r6 + +6: bf cr7*4+1,7f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r7,16(r4) +err1; ld r8,24(r4) +err1; ld r9,32(r4) +err1; ld r10,40(r4) +err1; ld r11,48(r4) +err1; ld r12,56(r4) + addi r4,r4,64 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r7,16(r3) +err1; std r8,24(r3) +err1; std r9,32(r3) +err1; std r10,40(r3) +err1; std r11,48(r3) +err1; std r12,56(r3) + addi r3,r3,64 + + /* Up to 63B to go */ +7: bf cr7*4+2,8f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r7,16(r4) +err1; ld r8,24(r4) + addi r4,r4,32 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r7,16(r3) +err1; std r8,24(r3) + addi r3,r3,32 + + /* Up to 31B to go */ +8: bf cr7*4+3,9f +err1; ld r0,0(r4) +err1; ld r6,8(r4) + addi r4,r4,16 +err1; std r0,0(r3) +err1; std r6,8(r3) + addi r3,r3,16 + +9: clrldi r5,r5,(64-4) + + /* Up to 15B to go */ +.Lshort_copy: + mtocrf 0x01,r5 + bf cr7*4+0,12f +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ +err1; lwz r6,4(r4) + addi r4,r4,8 +err1; stw r0,0(r3) +err1; stw r6,4(r3) + addi r3,r3,8 + +12: bf cr7*4+1,13f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + +13: bf cr7*4+2,14f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + +14: bf cr7*4+3,15f +err1; lbz r0,0(r4) +err1; stb r0,0(r3) + +15: li r3,0 + blr -- 2.13.6 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-04 23:19 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: linux-nvdimm, mpe, npiggin, Balbir Singh The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check excpetions into a return value on failure in case a machine check exception is encoutered during the memcpy. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 212 ++++++++++++++++++++++++++++++++++++ 3 files changed, 215 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9b8cedf618f4..b7e872a64726 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); #ifdef CONFIG_PPC64 #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 +#define __HAVE_ARCH_MEMCPY_MCSAFE +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 3c29c9009bbf..048afee9f518 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -24,7 +24,7 @@ endif obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ - memcpy_64.o memcmp_64.o pmem.o + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o obj64-$(CONFIG_SMP) += locks.o obj64-$(CONFIG_ALTIVEC) += vmx-helper.o diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S new file mode 100644 index 000000000000..e7eaa9b6cded --- /dev/null +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S @@ -0,0 +1,212 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com> + * Author - Balbir Singh <bsingharora@gmail.com> + */ +#include <asm/ppc_asm.h> +#include <asm/errno.h> + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addi r1,r1,STACKFRAMESIZE +.Ldo_err1: + li r3,-EFAULT + blr + + +_GLOBAL(memcpy_mcsafe) + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addi r4,r4,1 +err1; stb r0,0(r3) + addi r3,r3,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflr r0 + stdu r1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdi r6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r7,16(r4) +err2; ld r8,24(r4) +err2; ld r9,32(r4) +err2; ld r10,40(r4) +err2; ld r11,48(r4) +err2; ld r12,56(r4) +err2; ld r14,64(r4) +err2; ld r15,72(r4) +err2; ld r16,80(r4) +err2; ld r17,88(r4) +err2; ld r18,96(r4) +err2; ld r19,104(r4) +err2; ld r20,112(r4) +err2; ld r21,120(r4) + addi r4,r4,128 +err2; std r0,0(r3) +err2; std r6,8(r3) +err2; std r7,16(r3) +err2; std r8,24(r3) +err2; std r9,32(r3) +err2; std r10,40(r3) +err2; std r11,48(r3) +err2; std r12,56(r3) +err2; std r14,64(r3) +err2; std r15,72(r3) +err2; std r16,80(r3) +err2; std r17,88(r3) +err2; std r18,96(r3) +err2; std r19,104(r3) +err2; std r20,112(r3) +err2; std r21,120(r3) + addi r3,r3,128 + bdnz 4b + + clrldi r5,r5,(64-7) + + ld r14,STK_REG(R14)(r1) + ld r15,STK_REG(R15)(r1) + ld r16,STK_REG(R16)(r1) + ld r17,STK_REG(R17)(r1) + ld r18,STK_REG(R18)(r1) + ld r19,STK_REG(R19)(r1) + ld r20,STK_REG(R20)(r1) + ld r21,STK_REG(R21)(r1) + ld r22,STK_REG(R22)(r1) + addi r1,r1,STACKFRAMESIZE + + /* Up to 127B to go */ +5: srdi r6,r5,4 + mtocrf 0x01,r6 + +6: bf cr7*4+1,7f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r7,16(r4) +err1; ld r8,24(r4) +err1; ld r9,32(r4) +err1; ld r10,40(r4) +err1; ld r11,48(r4) +err1; ld r12,56(r4) + addi r4,r4,64 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r7,16(r3) +err1; std r8,24(r3) +err1; std r9,32(r3) +err1; std r10,40(r3) +err1; std r11,48(r3) +err1; std r12,56(r3) + addi r3,r3,64 + + /* Up to 63B to go */ +7: bf cr7*4+2,8f +err1; ld r0,0(r4) +err1; ld r6,8(r4) +err1; ld r7,16(r4) +err1; ld r8,24(r4) + addi r4,r4,32 +err1; std r0,0(r3) +err1; std r6,8(r3) +err1; std r7,16(r3) +err1; std r8,24(r3) + addi r3,r3,32 + + /* Up to 31B to go */ +8: bf cr7*4+3,9f +err1; ld r0,0(r4) +err1; ld r6,8(r4) + addi r4,r4,16 +err1; std r0,0(r3) +err1; std r6,8(r3) + addi r3,r3,16 + +9: clrldi r5,r5,(64-4) + + /* Up to 15B to go */ +.Lshort_copy: + mtocrf 0x01,r5 + bf cr7*4+0,12f +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ +err1; lwz r6,4(r4) + addi r4,r4,8 +err1; stw r0,0(r3) +err1; stw r6,4(r3) + addi r3,r3,8 + +12: bf cr7*4+1,13f +err1; lwz r0,0(r4) + addi r4,r4,4 +err1; stw r0,0(r3) + addi r3,r3,4 + +13: bf cr7*4+2,14f +err1; lhz r0,0(r4) + addi r4,r4,2 +err1; sth r0,0(r3) + addi r3,r3,2 + +14: bf cr7*4+3,15f +err1; lbz r0,0(r4) +err1; stb r0,0(r3) + +15: li r3,0 + blr -- 2.13.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-04 23:19 ` Balbir Singh @ 2018-04-04 23:57 ` Nicholas Piggin -1 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-04 23:57 UTC (permalink / raw) To: Balbir Singh; +Cc: mpe, linuxppc-dev, linux-nvdimm On Thu, 5 Apr 2018 09:19:42 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > The pmem infrastructure uses memcpy_mcsafe in the pmem > layer so as to convert machine check excpetions into > a return value on failure in case a machine check > exception is encoutered during the memcpy. > > This patch largely borrows from the copyuser_power7 > logic and does not add the VMX optimizations, largely > to keep the patch simple. If needed those optimizations > can be folded in. So memcpy_mcsafe doesn't return number of bytes copied? Huh, well that makes it simple. Would be nice if there was an easy way to share this with the regular memcpy code... that's probably for another day though, probably better to let this settle down first. I didn't review exact instructions, but the approach looks right to me. Acked-by: Nicholas Piggin <npiggin@gmail.com> > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/include/asm/string.h | 2 + > arch/powerpc/lib/Makefile | 2 +- > arch/powerpc/lib/memcpy_mcsafe_64.S | 212 ++++++++++++++++++++++++++++++++++++ > 3 files changed, 215 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S > > diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h > index 9b8cedf618f4..b7e872a64726 100644 > --- a/arch/powerpc/include/asm/string.h > +++ b/arch/powerpc/include/asm/string.h > @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); > #ifdef CONFIG_PPC64 > #define __HAVE_ARCH_MEMSET32 > #define __HAVE_ARCH_MEMSET64 > +#define __HAVE_ARCH_MEMCPY_MCSAFE > > +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); > extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); > extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); > extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile > index 3c29c9009bbf..048afee9f518 100644 > --- a/arch/powerpc/lib/Makefile > +++ b/arch/powerpc/lib/Makefile > @@ -24,7 +24,7 @@ endif > > obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ > copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ > - memcpy_64.o memcmp_64.o pmem.o > + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o > > obj64-$(CONFIG_SMP) += locks.o > obj64-$(CONFIG_ALTIVEC) += vmx-helper.o > diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S > new file mode 100644 > index 000000000000..e7eaa9b6cded > --- /dev/null > +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S > @@ -0,0 +1,212 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) IBM Corporation, 2011 > + * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com> > + * Author - Balbir Singh <bsingharora@gmail.com> > + */ > +#include <asm/ppc_asm.h> > +#include <asm/errno.h> > + > + .macro err1 > +100: > + EX_TABLE(100b,.Ldo_err1) > + .endm > + > + .macro err2 > +200: > + EX_TABLE(200b,.Ldo_err2) > + .endm > + > +.Ldo_err2: > + ld r22,STK_REG(R22)(r1) > + ld r21,STK_REG(R21)(r1) > + ld r20,STK_REG(R20)(r1) > + ld r19,STK_REG(R19)(r1) > + ld r18,STK_REG(R18)(r1) > + ld r17,STK_REG(R17)(r1) > + ld r16,STK_REG(R16)(r1) > + ld r15,STK_REG(R15)(r1) > + ld r14,STK_REG(R14)(r1) > + addi r1,r1,STACKFRAMESIZE > +.Ldo_err1: > + li r3,-EFAULT > + blr > + > + > +_GLOBAL(memcpy_mcsafe) > + cmpldi r5,16 > + blt .Lshort_copy > + > +.Lcopy: > + /* Get the source 8B aligned */ > + neg r6,r4 > + mtocrf 0x01,r6 > + clrldi r6,r6,(64-3) > + > + bf cr7*4+3,1f > +err1; lbz r0,0(r4) > + addi r4,r4,1 > +err1; stb r0,0(r3) > + addi r3,r3,1 > + > +1: bf cr7*4+2,2f > +err1; lhz r0,0(r4) > + addi r4,r4,2 > +err1; sth r0,0(r3) > + addi r3,r3,2 > + > +2: bf cr7*4+1,3f > +err1; lwz r0,0(r4) > + addi r4,r4,4 > +err1; stw r0,0(r3) > + addi r3,r3,4 > + > +3: sub r5,r5,r6 > + cmpldi r5,128 > + blt 5f > + > + mflr r0 > + stdu r1,-STACKFRAMESIZE(r1) > + std r14,STK_REG(R14)(r1) > + std r15,STK_REG(R15)(r1) > + std r16,STK_REG(R16)(r1) > + std r17,STK_REG(R17)(r1) > + std r18,STK_REG(R18)(r1) > + std r19,STK_REG(R19)(r1) > + std r20,STK_REG(R20)(r1) > + std r21,STK_REG(R21)(r1) > + std r22,STK_REG(R22)(r1) > + std r0,STACKFRAMESIZE+16(r1) > + > + srdi r6,r5,7 > + mtctr r6 > + > + /* Now do cacheline (128B) sized loads and stores. */ > + .align 5 > +4: > +err2; ld r0,0(r4) > +err2; ld r6,8(r4) > +err2; ld r7,16(r4) > +err2; ld r8,24(r4) > +err2; ld r9,32(r4) > +err2; ld r10,40(r4) > +err2; ld r11,48(r4) > +err2; ld r12,56(r4) > +err2; ld r14,64(r4) > +err2; ld r15,72(r4) > +err2; ld r16,80(r4) > +err2; ld r17,88(r4) > +err2; ld r18,96(r4) > +err2; ld r19,104(r4) > +err2; ld r20,112(r4) > +err2; ld r21,120(r4) > + addi r4,r4,128 > +err2; std r0,0(r3) > +err2; std r6,8(r3) > +err2; std r7,16(r3) > +err2; std r8,24(r3) > +err2; std r9,32(r3) > +err2; std r10,40(r3) > +err2; std r11,48(r3) > +err2; std r12,56(r3) > +err2; std r14,64(r3) > +err2; std r15,72(r3) > +err2; std r16,80(r3) > +err2; std r17,88(r3) > +err2; std r18,96(r3) > +err2; std r19,104(r3) > +err2; std r20,112(r3) > +err2; std r21,120(r3) > + addi r3,r3,128 > + bdnz 4b > + > + clrldi r5,r5,(64-7) > + > + ld r14,STK_REG(R14)(r1) > + ld r15,STK_REG(R15)(r1) > + ld r16,STK_REG(R16)(r1) > + ld r17,STK_REG(R17)(r1) > + ld r18,STK_REG(R18)(r1) > + ld r19,STK_REG(R19)(r1) > + ld r20,STK_REG(R20)(r1) > + ld r21,STK_REG(R21)(r1) > + ld r22,STK_REG(R22)(r1) > + addi r1,r1,STACKFRAMESIZE > + > + /* Up to 127B to go */ > +5: srdi r6,r5,4 > + mtocrf 0x01,r6 > + > +6: bf cr7*4+1,7f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > +err1; ld r7,16(r4) > +err1; ld r8,24(r4) > +err1; ld r9,32(r4) > +err1; ld r10,40(r4) > +err1; ld r11,48(r4) > +err1; ld r12,56(r4) > + addi r4,r4,64 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > +err1; std r7,16(r3) > +err1; std r8,24(r3) > +err1; std r9,32(r3) > +err1; std r10,40(r3) > +err1; std r11,48(r3) > +err1; std r12,56(r3) > + addi r3,r3,64 > + > + /* Up to 63B to go */ > +7: bf cr7*4+2,8f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > +err1; ld r7,16(r4) > +err1; ld r8,24(r4) > + addi r4,r4,32 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > +err1; std r7,16(r3) > +err1; std r8,24(r3) > + addi r3,r3,32 > + > + /* Up to 31B to go */ > +8: bf cr7*4+3,9f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > + addi r4,r4,16 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > + addi r3,r3,16 > + > +9: clrldi r5,r5,(64-4) > + > + /* Up to 15B to go */ > +.Lshort_copy: > + mtocrf 0x01,r5 > + bf cr7*4+0,12f > +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ > +err1; lwz r6,4(r4) > + addi r4,r4,8 > +err1; stw r0,0(r3) > +err1; stw r6,4(r3) > + addi r3,r3,8 > + > +12: bf cr7*4+1,13f > +err1; lwz r0,0(r4) > + addi r4,r4,4 > +err1; stw r0,0(r3) > + addi r3,r3,4 > + > +13: bf cr7*4+2,14f > +err1; lhz r0,0(r4) > + addi r4,r4,2 > +err1; sth r0,0(r3) > + addi r3,r3,2 > + > +14: bf cr7*4+3,15f > +err1; lbz r0,0(r4) > +err1; stb r0,0(r3) > + > +15: li r3,0 > + blr _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-04 23:57 ` Nicholas Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-04 23:57 UTC (permalink / raw) To: Balbir Singh; +Cc: linuxppc-dev, linux-nvdimm, mpe On Thu, 5 Apr 2018 09:19:42 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > The pmem infrastructure uses memcpy_mcsafe in the pmem > layer so as to convert machine check excpetions into > a return value on failure in case a machine check > exception is encoutered during the memcpy. > > This patch largely borrows from the copyuser_power7 > logic and does not add the VMX optimizations, largely > to keep the patch simple. If needed those optimizations > can be folded in. So memcpy_mcsafe doesn't return number of bytes copied? Huh, well that makes it simple. Would be nice if there was an easy way to share this with the regular memcpy code... that's probably for another day though, probably better to let this settle down first. I didn't review exact instructions, but the approach looks right to me. Acked-by: Nicholas Piggin <npiggin@gmail.com> > > Signed-off-by: Balbir Singh <bsingharora@gmail.com> > --- > arch/powerpc/include/asm/string.h | 2 + > arch/powerpc/lib/Makefile | 2 +- > arch/powerpc/lib/memcpy_mcsafe_64.S | 212 ++++++++++++++++++++++++++++++++++++ > 3 files changed, 215 insertions(+), 1 deletion(-) > create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S > > diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h > index 9b8cedf618f4..b7e872a64726 100644 > --- a/arch/powerpc/include/asm/string.h > +++ b/arch/powerpc/include/asm/string.h > @@ -30,7 +30,9 @@ extern void * memcpy_flushcache(void *,const void *,__kernel_size_t); > #ifdef CONFIG_PPC64 > #define __HAVE_ARCH_MEMSET32 > #define __HAVE_ARCH_MEMSET64 > +#define __HAVE_ARCH_MEMCPY_MCSAFE > > +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); > extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); > extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); > extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile > index 3c29c9009bbf..048afee9f518 100644 > --- a/arch/powerpc/lib/Makefile > +++ b/arch/powerpc/lib/Makefile > @@ -24,7 +24,7 @@ endif > > obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ > copyuser_power7.o string_64.o copypage_power7.o memcpy_power7.o \ > - memcpy_64.o memcmp_64.o pmem.o > + memcpy_64.o memcmp_64.o pmem.o memcpy_mcsafe_64.o > > obj64-$(CONFIG_SMP) += locks.o > obj64-$(CONFIG_ALTIVEC) += vmx-helper.o > diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S > new file mode 100644 > index 000000000000..e7eaa9b6cded > --- /dev/null > +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S > @@ -0,0 +1,212 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) IBM Corporation, 2011 > + * Derived from copyuser_power7.s by Anton Blanchard <anton@au.ibm.com> > + * Author - Balbir Singh <bsingharora@gmail.com> > + */ > +#include <asm/ppc_asm.h> > +#include <asm/errno.h> > + > + .macro err1 > +100: > + EX_TABLE(100b,.Ldo_err1) > + .endm > + > + .macro err2 > +200: > + EX_TABLE(200b,.Ldo_err2) > + .endm > + > +.Ldo_err2: > + ld r22,STK_REG(R22)(r1) > + ld r21,STK_REG(R21)(r1) > + ld r20,STK_REG(R20)(r1) > + ld r19,STK_REG(R19)(r1) > + ld r18,STK_REG(R18)(r1) > + ld r17,STK_REG(R17)(r1) > + ld r16,STK_REG(R16)(r1) > + ld r15,STK_REG(R15)(r1) > + ld r14,STK_REG(R14)(r1) > + addi r1,r1,STACKFRAMESIZE > +.Ldo_err1: > + li r3,-EFAULT > + blr > + > + > +_GLOBAL(memcpy_mcsafe) > + cmpldi r5,16 > + blt .Lshort_copy > + > +.Lcopy: > + /* Get the source 8B aligned */ > + neg r6,r4 > + mtocrf 0x01,r6 > + clrldi r6,r6,(64-3) > + > + bf cr7*4+3,1f > +err1; lbz r0,0(r4) > + addi r4,r4,1 > +err1; stb r0,0(r3) > + addi r3,r3,1 > + > +1: bf cr7*4+2,2f > +err1; lhz r0,0(r4) > + addi r4,r4,2 > +err1; sth r0,0(r3) > + addi r3,r3,2 > + > +2: bf cr7*4+1,3f > +err1; lwz r0,0(r4) > + addi r4,r4,4 > +err1; stw r0,0(r3) > + addi r3,r3,4 > + > +3: sub r5,r5,r6 > + cmpldi r5,128 > + blt 5f > + > + mflr r0 > + stdu r1,-STACKFRAMESIZE(r1) > + std r14,STK_REG(R14)(r1) > + std r15,STK_REG(R15)(r1) > + std r16,STK_REG(R16)(r1) > + std r17,STK_REG(R17)(r1) > + std r18,STK_REG(R18)(r1) > + std r19,STK_REG(R19)(r1) > + std r20,STK_REG(R20)(r1) > + std r21,STK_REG(R21)(r1) > + std r22,STK_REG(R22)(r1) > + std r0,STACKFRAMESIZE+16(r1) > + > + srdi r6,r5,7 > + mtctr r6 > + > + /* Now do cacheline (128B) sized loads and stores. */ > + .align 5 > +4: > +err2; ld r0,0(r4) > +err2; ld r6,8(r4) > +err2; ld r7,16(r4) > +err2; ld r8,24(r4) > +err2; ld r9,32(r4) > +err2; ld r10,40(r4) > +err2; ld r11,48(r4) > +err2; ld r12,56(r4) > +err2; ld r14,64(r4) > +err2; ld r15,72(r4) > +err2; ld r16,80(r4) > +err2; ld r17,88(r4) > +err2; ld r18,96(r4) > +err2; ld r19,104(r4) > +err2; ld r20,112(r4) > +err2; ld r21,120(r4) > + addi r4,r4,128 > +err2; std r0,0(r3) > +err2; std r6,8(r3) > +err2; std r7,16(r3) > +err2; std r8,24(r3) > +err2; std r9,32(r3) > +err2; std r10,40(r3) > +err2; std r11,48(r3) > +err2; std r12,56(r3) > +err2; std r14,64(r3) > +err2; std r15,72(r3) > +err2; std r16,80(r3) > +err2; std r17,88(r3) > +err2; std r18,96(r3) > +err2; std r19,104(r3) > +err2; std r20,112(r3) > +err2; std r21,120(r3) > + addi r3,r3,128 > + bdnz 4b > + > + clrldi r5,r5,(64-7) > + > + ld r14,STK_REG(R14)(r1) > + ld r15,STK_REG(R15)(r1) > + ld r16,STK_REG(R16)(r1) > + ld r17,STK_REG(R17)(r1) > + ld r18,STK_REG(R18)(r1) > + ld r19,STK_REG(R19)(r1) > + ld r20,STK_REG(R20)(r1) > + ld r21,STK_REG(R21)(r1) > + ld r22,STK_REG(R22)(r1) > + addi r1,r1,STACKFRAMESIZE > + > + /* Up to 127B to go */ > +5: srdi r6,r5,4 > + mtocrf 0x01,r6 > + > +6: bf cr7*4+1,7f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > +err1; ld r7,16(r4) > +err1; ld r8,24(r4) > +err1; ld r9,32(r4) > +err1; ld r10,40(r4) > +err1; ld r11,48(r4) > +err1; ld r12,56(r4) > + addi r4,r4,64 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > +err1; std r7,16(r3) > +err1; std r8,24(r3) > +err1; std r9,32(r3) > +err1; std r10,40(r3) > +err1; std r11,48(r3) > +err1; std r12,56(r3) > + addi r3,r3,64 > + > + /* Up to 63B to go */ > +7: bf cr7*4+2,8f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > +err1; ld r7,16(r4) > +err1; ld r8,24(r4) > + addi r4,r4,32 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > +err1; std r7,16(r3) > +err1; std r8,24(r3) > + addi r3,r3,32 > + > + /* Up to 31B to go */ > +8: bf cr7*4+3,9f > +err1; ld r0,0(r4) > +err1; ld r6,8(r4) > + addi r4,r4,16 > +err1; std r0,0(r3) > +err1; std r6,8(r3) > + addi r3,r3,16 > + > +9: clrldi r5,r5,(64-4) > + > + /* Up to 15B to go */ > +.Lshort_copy: > + mtocrf 0x01,r5 > + bf cr7*4+0,12f > +err1; lwz r0,0(r4) /* Less chance of a reject with word ops */ > +err1; lwz r6,4(r4) > + addi r4,r4,8 > +err1; stw r0,0(r3) > +err1; stw r6,4(r3) > + addi r3,r3,8 > + > +12: bf cr7*4+1,13f > +err1; lwz r0,0(r4) > + addi r4,r4,4 > +err1; stw r0,0(r3) > + addi r3,r3,4 > + > +13: bf cr7*4+2,14f > +err1; lhz r0,0(r4) > + addi r4,r4,2 > +err1; sth r0,0(r3) > + addi r3,r3,2 > + > +14: bf cr7*4+3,15f > +err1; lbz r0,0(r4) > +err1; stb r0,0(r3) > + > +15: li r3,0 > + blr ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-04 23:57 ` Nicholas Piggin @ 2018-04-05 3:00 ` Dan Williams -1 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-04-05 3:00 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig [ adding Matthew, Christoph, and Tony ] On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 09:19:42 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > >> The pmem infrastructure uses memcpy_mcsafe in the pmem >> layer so as to convert machine check excpetions into >> a return value on failure in case a machine check >> exception is encoutered during the memcpy. >> >> This patch largely borrows from the copyuser_power7 >> logic and does not add the VMX optimizations, largely >> to keep the patch simple. If needed those optimizations >> can be folded in. > > So memcpy_mcsafe doesn't return number of bytes copied? > Huh, well that makes it simple. Well, not in current kernels, but we need to add that support or remove the direct call to copy_to_iter() in fs/dax.c. I'm looking right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), but for copy_to_user we also need to handle bytes remaining for write faults. That fix is hopefully something that can land in an early 4.17-rc, but it won't be ready for -rc1. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 3:00 ` Dan Williams 0 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-04-05 3:00 UTC (permalink / raw) To: Nicholas Piggin Cc: Balbir Singh, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony [ adding Matthew, Christoph, and Tony ] On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 09:19:42 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > >> The pmem infrastructure uses memcpy_mcsafe in the pmem >> layer so as to convert machine check excpetions into >> a return value on failure in case a machine check >> exception is encoutered during the memcpy. >> >> This patch largely borrows from the copyuser_power7 >> logic and does not add the VMX optimizations, largely >> to keep the patch simple. If needed those optimizations >> can be folded in. > > So memcpy_mcsafe doesn't return number of bytes copied? > Huh, well that makes it simple. Well, not in current kernels, but we need to add that support or remove the direct call to copy_to_iter() in fs/dax.c. I'm looking right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), but for copy_to_user we also need to handle bytes remaining for write faults. That fix is hopefully something that can land in an early 4.17-rc, but it won't be ready for -rc1. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 3:00 ` Dan Williams @ 2018-04-05 5:04 ` Nicholas Piggin -1 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-05 5:04 UTC (permalink / raw) To: Dan Williams Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Wed, 4 Apr 2018 20:00:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > [ adding Matthew, Christoph, and Tony ] > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > On Thu, 5 Apr 2018 09:19:42 +1000 > > Balbir Singh <bsingharora@gmail.com> wrote: > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > >> layer so as to convert machine check excpetions into > >> a return value on failure in case a machine check > >> exception is encoutered during the memcpy. > >> > >> This patch largely borrows from the copyuser_power7 > >> logic and does not add the VMX optimizations, largely > >> to keep the patch simple. If needed those optimizations > >> can be folded in. > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > Huh, well that makes it simple. > > Well, not in current kernels, but we need to add that support or > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > but for copy_to_user we also need to handle bytes remaining for write > faults. That fix is hopefully something that can land in an early > 4.17-rc, but it won't be ready for -rc1. I wonder if the powerpc implementation should just go straight to counting bytes. Backporting to this interface would be trivial, but it would just mean there's only one variant of the code to support. That's up to Balbir though. Thanks, Nick _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 5:04 ` Nicholas Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-05 5:04 UTC (permalink / raw) To: Dan Williams Cc: Balbir Singh, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Wed, 4 Apr 2018 20:00:52 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > [ adding Matthew, Christoph, and Tony ] > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > On Thu, 5 Apr 2018 09:19:42 +1000 > > Balbir Singh <bsingharora@gmail.com> wrote: > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > >> layer so as to convert machine check excpetions into > >> a return value on failure in case a machine check > >> exception is encoutered during the memcpy. > >> > >> This patch largely borrows from the copyuser_power7 > >> logic and does not add the VMX optimizations, largely > >> to keep the patch simple. If needed those optimizations > >> can be folded in. > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > Huh, well that makes it simple. > > Well, not in current kernels, but we need to add that support or > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > but for copy_to_user we also need to handle bytes remaining for write > faults. That fix is hopefully something that can land in an early > 4.17-rc, but it won't be ready for -rc1. I wonder if the powerpc implementation should just go straight to counting bytes. Backporting to this interface would be trivial, but it would just mean there's only one variant of the code to support. That's up to Balbir though. Thanks, Nick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 5:04 ` Nicholas Piggin @ 2018-04-05 5:53 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-05 5:53 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Thu, 5 Apr 2018 15:04:05 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Wed, 4 Apr 2018 20:00:52 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > [ adding Matthew, Christoph, and Tony ] > > > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Thu, 5 Apr 2018 09:19:42 +1000 > > > Balbir Singh <bsingharora@gmail.com> wrote: > > > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > > >> layer so as to convert machine check excpetions into > > >> a return value on failure in case a machine check > > >> exception is encoutered during the memcpy. > > >> > > >> This patch largely borrows from the copyuser_power7 > > >> logic and does not add the VMX optimizations, largely > > >> to keep the patch simple. If needed those optimizations > > >> can be folded in. > > > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > > Huh, well that makes it simple. > > > > Well, not in current kernels, but we need to add that support or > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > > but for copy_to_user we also need to handle bytes remaining for write > > faults. That fix is hopefully something that can land in an early > > 4.17-rc, but it won't be ready for -rc1. > > I wonder if the powerpc implementation should just go straight to > counting bytes. Backporting to this interface would be trivial, but > it would just mean there's only one variant of the code to support. > That's up to Balbir though. > I'm thinking about it, I wonder what "bytes remaining" mean in pmem context in the context of a machine check exception. Also, do we want to be byte accurate or cache-line accurate for the bytes remaining? The former is much easier than the latter :) I'd rather implement the existing interface and port/support the new interface as it becomes available Balbir Singh. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 5:53 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-05 5:53 UTC (permalink / raw) To: Nicholas Piggin Cc: Dan Williams, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Thu, 5 Apr 2018 15:04:05 +1000 Nicholas Piggin <npiggin@gmail.com> wrote: > On Wed, 4 Apr 2018 20:00:52 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > [ adding Matthew, Christoph, and Tony ] > > > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Thu, 5 Apr 2018 09:19:42 +1000 > > > Balbir Singh <bsingharora@gmail.com> wrote: > > > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > > >> layer so as to convert machine check excpetions into > > >> a return value on failure in case a machine check > > >> exception is encoutered during the memcpy. > > >> > > >> This patch largely borrows from the copyuser_power7 > > >> logic and does not add the VMX optimizations, largely > > >> to keep the patch simple. If needed those optimizations > > >> can be folded in. > > > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > > Huh, well that makes it simple. > > > > Well, not in current kernels, but we need to add that support or > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > > but for copy_to_user we also need to handle bytes remaining for write > > faults. That fix is hopefully something that can land in an early > > 4.17-rc, but it won't be ready for -rc1. > > I wonder if the powerpc implementation should just go straight to > counting bytes. Backporting to this interface would be trivial, but > it would just mean there's only one variant of the code to support. > That's up to Balbir though. > I'm thinking about it, I wonder what "bytes remaining" mean in pmem context in the context of a machine check exception. Also, do we want to be byte accurate or cache-line accurate for the bytes remaining? The former is much easier than the latter :) I'd rather implement the existing interface and port/support the new interface as it becomes available Balbir Singh. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 5:53 ` Balbir Singh @ 2018-04-05 6:45 ` Nicholas Piggin -1 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-05 6:45 UTC (permalink / raw) To: Balbir Singh Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Thu, 5 Apr 2018 15:53:07 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > On Thu, 5 Apr 2018 15:04:05 +1000 > Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Wed, 4 Apr 2018 20:00:52 -0700 > > Dan Williams <dan.j.williams@intel.com> wrote: > > > > > [ adding Matthew, Christoph, and Tony ] > > > > > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > > > On Thu, 5 Apr 2018 09:19:42 +1000 > > > > Balbir Singh <bsingharora@gmail.com> wrote: > > > > > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > > > >> layer so as to convert machine check excpetions into > > > >> a return value on failure in case a machine check > > > >> exception is encoutered during the memcpy. > > > >> > > > >> This patch largely borrows from the copyuser_power7 > > > >> logic and does not add the VMX optimizations, largely > > > >> to keep the patch simple. If needed those optimizations > > > >> can be folded in. > > > > > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > > > Huh, well that makes it simple. > > > > > > Well, not in current kernels, but we need to add that support or > > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > > > but for copy_to_user we also need to handle bytes remaining for write > > > faults. That fix is hopefully something that can land in an early > > > 4.17-rc, but it won't be ready for -rc1. > > > > I wonder if the powerpc implementation should just go straight to > > counting bytes. Backporting to this interface would be trivial, but > > it would just mean there's only one variant of the code to support. > > That's up to Balbir though. > > > > I'm thinking about it, I wonder what "bytes remaining" mean in pmem context > in the context of a machine check exception. Also, do we want to be byte > accurate or cache-line accurate for the bytes remaining? The former is much > easier than the latter :) The ideal would be a linear measure of how much of your copy reached (or can reach) non-volatile storage with nothing further copied. You may have to allow for some relaxing of the semantics depending on what the architecture can support. What's the problem with just counting bytes copied like usercopy -- why is that harder than cacheline accuracy? > I'd rather implement the existing interface and port/support the new interface > as it becomes available Fair enough. Thanks, Nick _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 6:45 ` Nicholas Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-05 6:45 UTC (permalink / raw) To: Balbir Singh Cc: Dan Williams, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Thu, 5 Apr 2018 15:53:07 +1000 Balbir Singh <bsingharora@gmail.com> wrote: > On Thu, 5 Apr 2018 15:04:05 +1000 > Nicholas Piggin <npiggin@gmail.com> wrote: > > > On Wed, 4 Apr 2018 20:00:52 -0700 > > Dan Williams <dan.j.williams@intel.com> wrote: > > > > > [ adding Matthew, Christoph, and Tony ] > > > > > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > > > > On Thu, 5 Apr 2018 09:19:42 +1000 > > > > Balbir Singh <bsingharora@gmail.com> wrote: > > > > > > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem > > > >> layer so as to convert machine check excpetions into > > > >> a return value on failure in case a machine check > > > >> exception is encoutered during the memcpy. > > > >> > > > >> This patch largely borrows from the copyuser_power7 > > > >> logic and does not add the VMX optimizations, largely > > > >> to keep the patch simple. If needed those optimizations > > > >> can be folded in. > > > > > > > > So memcpy_mcsafe doesn't return number of bytes copied? > > > > Huh, well that makes it simple. > > > > > > Well, not in current kernels, but we need to add that support or > > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking > > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), > > > but for copy_to_user we also need to handle bytes remaining for write > > > faults. That fix is hopefully something that can land in an early > > > 4.17-rc, but it won't be ready for -rc1. > > > > I wonder if the powerpc implementation should just go straight to > > counting bytes. Backporting to this interface would be trivial, but > > it would just mean there's only one variant of the code to support. > > That's up to Balbir though. > > > > I'm thinking about it, I wonder what "bytes remaining" mean in pmem context > in the context of a machine check exception. Also, do we want to be byte > accurate or cache-line accurate for the bytes remaining? The former is much > easier than the latter :) The ideal would be a linear measure of how much of your copy reached (or can reach) non-volatile storage with nothing further copied. You may have to allow for some relaxing of the semantics depending on what the architecture can support. What's the problem with just counting bytes copied like usercopy -- why is that harder than cacheline accuracy? > I'd rather implement the existing interface and port/support the new interface > as it becomes available Fair enough. Thanks, Nick ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 6:45 ` Nicholas Piggin @ 2018-04-05 15:00 ` Dan Williams -1 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-04-05 15:00 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 15:53:07 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > >> On Thu, 5 Apr 2018 15:04:05 +1000 >> Nicholas Piggin <npiggin@gmail.com> wrote: >> >> > On Wed, 4 Apr 2018 20:00:52 -0700 >> > Dan Williams <dan.j.williams@intel.com> wrote: >> > >> > > [ adding Matthew, Christoph, and Tony ] >> > > >> > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: >> > > > On Thu, 5 Apr 2018 09:19:42 +1000 >> > > > Balbir Singh <bsingharora@gmail.com> wrote: >> > > > >> > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem >> > > >> layer so as to convert machine check excpetions into >> > > >> a return value on failure in case a machine check >> > > >> exception is encoutered during the memcpy. >> > > >> >> > > >> This patch largely borrows from the copyuser_power7 >> > > >> logic and does not add the VMX optimizations, largely >> > > >> to keep the patch simple. If needed those optimizations >> > > >> can be folded in. >> > > > >> > > > So memcpy_mcsafe doesn't return number of bytes copied? >> > > > Huh, well that makes it simple. >> > > >> > > Well, not in current kernels, but we need to add that support or >> > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking >> > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), >> > > but for copy_to_user we also need to handle bytes remaining for write >> > > faults. That fix is hopefully something that can land in an early >> > > 4.17-rc, but it won't be ready for -rc1. >> > >> > I wonder if the powerpc implementation should just go straight to >> > counting bytes. Backporting to this interface would be trivial, but >> > it would just mean there's only one variant of the code to support. >> > That's up to Balbir though. >> > >> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> in the context of a machine check exception. Also, do we want to be byte >> accurate or cache-line accurate for the bytes remaining? The former is much >> easier than the latter :) > > The ideal would be a linear measure of how much of your copy reached > (or can reach) non-volatile storage with nothing further copied. You > may have to allow for some relaxing of the semantics depending on > what the architecture can support. > > What's the problem with just counting bytes copied like usercopy -- > why is that harder than cacheline accuracy? > >> I'd rather implement the existing interface and port/support the new interface >> as it becomes available > > Fair enough. I have patches already in progress to change the interface. My preference is to hold off on adding a new implementation that will need to be immediately reworked. When I say "immediate" I mean that should be able to post what I have for review within the next few days. Whether this is all too late for 4.17 is another question... _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 15:00 ` Dan Williams 0 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-04-05 15:00 UTC (permalink / raw) To: Nicholas Piggin Cc: Balbir Singh, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 5 Apr 2018 15:53:07 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: > >> On Thu, 5 Apr 2018 15:04:05 +1000 >> Nicholas Piggin <npiggin@gmail.com> wrote: >> >> > On Wed, 4 Apr 2018 20:00:52 -0700 >> > Dan Williams <dan.j.williams@intel.com> wrote: >> > >> > > [ adding Matthew, Christoph, and Tony ] >> > > >> > > On Wed, Apr 4, 2018 at 4:57 PM, Nicholas Piggin <npiggin@gmail.com> wrote: >> > > > On Thu, 5 Apr 2018 09:19:42 +1000 >> > > > Balbir Singh <bsingharora@gmail.com> wrote: >> > > > >> > > >> The pmem infrastructure uses memcpy_mcsafe in the pmem >> > > >> layer so as to convert machine check excpetions into >> > > >> a return value on failure in case a machine check >> > > >> exception is encoutered during the memcpy. >> > > >> >> > > >> This patch largely borrows from the copyuser_power7 >> > > >> logic and does not add the VMX optimizations, largely >> > > >> to keep the patch simple. If needed those optimizations >> > > >> can be folded in. >> > > > >> > > > So memcpy_mcsafe doesn't return number of bytes copied? >> > > > Huh, well that makes it simple. >> > > >> > > Well, not in current kernels, but we need to add that support or >> > > remove the direct call to copy_to_iter() in fs/dax.c. I'm looking >> > > right now to add "bytes remaining" support to the x86 memcpy_mcsafe(), >> > > but for copy_to_user we also need to handle bytes remaining for write >> > > faults. That fix is hopefully something that can land in an early >> > > 4.17-rc, but it won't be ready for -rc1. >> > >> > I wonder if the powerpc implementation should just go straight to >> > counting bytes. Backporting to this interface would be trivial, but >> > it would just mean there's only one variant of the code to support. >> > That's up to Balbir though. >> > >> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> in the context of a machine check exception. Also, do we want to be byte >> accurate or cache-line accurate for the bytes remaining? The former is much >> easier than the latter :) > > The ideal would be a linear measure of how much of your copy reached > (or can reach) non-volatile storage with nothing further copied. You > may have to allow for some relaxing of the semantics depending on > what the architecture can support. > > What's the problem with just counting bytes copied like usercopy -- > why is that harder than cacheline accuracy? > >> I'd rather implement the existing interface and port/support the new interface >> as it becomes available > > Fair enough. I have patches already in progress to change the interface. My preference is to hold off on adding a new implementation that will need to be immediately reworked. When I say "immediate" I mean that should be able to post what I have for review within the next few days. Whether this is all too late for 4.17 is another question... ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 15:00 ` Dan Williams @ 2018-05-01 20:57 ` Dan Williams -1 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-05-01 20:57 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: [,,] >> What's the problem with just counting bytes copied like usercopy -- >> why is that harder than cacheline accuracy? >> >>> I'd rather implement the existing interface and port/support the new interface >>> as it becomes available >> >> Fair enough. > > I have patches already in progress to change the interface. My > preference is to hold off on adding a new implementation that will > need to be immediately reworked. When I say "immediate" I mean that > should be able to post what I have for review within the next few > days. > > Whether this is all too late for 4.17 is another question... Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-05-01 20:57 ` Dan Williams 0 siblings, 0 replies; 36+ messages in thread From: Dan Williams @ 2018-05-01 20:57 UTC (permalink / raw) To: Nicholas Piggin Cc: Balbir Singh, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: [,,] >> What's the problem with just counting bytes copied like usercopy -- >> why is that harder than cacheline accuracy? >> >>> I'd rather implement the existing interface and port/support the new interface >>> as it becomes available >> >> Fair enough. > > I have patches already in progress to change the interface. My > preference is to hold off on adding a new implementation that will > need to be immediately reworked. When I say "immediate" I mean that > should be able to post what I have for review within the next few > days. > > Whether this is all too late for 4.17 is another question... Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-05-01 20:57 ` Dan Williams @ 2018-05-02 12:36 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-05-02 12:36 UTC (permalink / raw) To: Dan Williams Cc: Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, Nicholas Piggin, linuxppc-dev, Christoph Hellwig On Wed, May 2, 2018 at 6:57 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > [,,] >>> What's the problem with just counting bytes copied like usercopy -- >>> why is that harder than cacheline accuracy? >>> >>>> I'd rather implement the existing interface and port/support the new interface >>>> as it becomes available >>> >>> Fair enough. >> >> I have patches already in progress to change the interface. My >> preference is to hold off on adding a new implementation that will >> need to be immediately reworked. When I say "immediate" I mean that >> should be able to post what I have for review within the next few >> days. >> >> Whether this is all too late for 4.17 is another question... > > Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation: > > https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html Thanks for the heads up! I'll work on the implementation for powerpc. Balbir Singh. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-05-02 12:36 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-05-02 12:36 UTC (permalink / raw) To: Dan Williams Cc: Nicholas Piggin, Michael Ellerman, linuxppc-dev, linux-nvdimm, Christoph Hellwig, Matthew Wilcox, Luck, Tony On Wed, May 2, 2018 at 6:57 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Apr 5, 2018 at 8:00 AM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Wed, Apr 4, 2018 at 11:45 PM, Nicholas Piggin <npiggin@gmail.com> wrote: > [,,] >>> What's the problem with just counting bytes copied like usercopy -- >>> why is that harder than cacheline accuracy? >>> >>>> I'd rather implement the existing interface and port/support the new interface >>>> as it becomes available >>> >>> Fair enough. >> >> I have patches already in progress to change the interface. My >> preference is to hold off on adding a new implementation that will >> need to be immediately reworked. When I say "immediate" I mean that >> should be able to post what I have for review within the next few >> days. >> >> Whether this is all too late for 4.17 is another question... > > Here is the x86 version of a 'bytes remaining' memcpy_mcsafe() implemenation: > > https://lists.01.org/pipermail/linux-nvdimm/2018-May/015548.html Thanks for the heads up! I'll work on the implementation for powerpc. Balbir Singh. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 6:45 ` Nicholas Piggin @ 2018-04-05 20:40 ` Jeff Moyer -1 siblings, 0 replies; 36+ messages in thread From: Jeff Moyer @ 2018-04-05 20:40 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, linux-nvdimm, Michael Ellerman, Matthew Wilcox, linuxppc-dev, Christoph Hellwig Nicholas Piggin <npiggin@gmail.com> writes: > On Thu, 5 Apr 2018 15:53:07 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> in the context of a machine check exception. Also, do we want to be byte >> accurate or cache-line accurate for the bytes remaining? The former is much >> easier than the latter :) > > The ideal would be a linear measure of how much of your copy reached > (or can reach) non-volatile storage with nothing further copied. You > may have to allow for some relaxing of the semantics depending on > what the architecture can support. I think you've got that backwards. memcpy_mcsafe is used to copy *from* persistent memory. The idea is to catch errors when reading pmem, not writing to it. > What's the problem with just counting bytes copied like usercopy -- > why is that harder than cacheline accuracy? He said the former (i.e. bytes) is easier. So, I think you're on the same page. :) Cheers, Jeff _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-05 20:40 ` Jeff Moyer 0 siblings, 0 replies; 36+ messages in thread From: Jeff Moyer @ 2018-04-05 20:40 UTC (permalink / raw) To: Nicholas Piggin Cc: Balbir Singh, Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig Nicholas Piggin <npiggin@gmail.com> writes: > On Thu, 5 Apr 2018 15:53:07 +1000 > Balbir Singh <bsingharora@gmail.com> wrote: >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> in the context of a machine check exception. Also, do we want to be byte >> accurate or cache-line accurate for the bytes remaining? The former is much >> easier than the latter :) > > The ideal would be a linear measure of how much of your copy reached > (or can reach) non-volatile storage with nothing further copied. You > may have to allow for some relaxing of the semantics depending on > what the architecture can support. I think you've got that backwards. memcpy_mcsafe is used to copy *from* persistent memory. The idea is to catch errors when reading pmem, not writing to it. > What's the problem with just counting bytes copied like usercopy -- > why is that harder than cacheline accuracy? He said the former (i.e. bytes) is easier. So, I think you're on the same page. :) Cheers, Jeff ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-05 20:40 ` Jeff Moyer @ 2018-04-06 1:26 ` Nicholas Piggin -1 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-06 1:26 UTC (permalink / raw) To: Jeff Moyer Cc: Luck, Tony, linux-nvdimm, Michael Ellerman, Matthew Wilcox, linuxppc-dev, Christoph Hellwig On Thu, 05 Apr 2018 16:40:26 -0400 Jeff Moyer <jmoyer@redhat.com> wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > On Thu, 5 Apr 2018 15:53:07 +1000 > > Balbir Singh <bsingharora@gmail.com> wrote: > >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context > >> in the context of a machine check exception. Also, do we want to be byte > >> accurate or cache-line accurate for the bytes remaining? The former is much > >> easier than the latter :) > > > > The ideal would be a linear measure of how much of your copy reached > > (or can reach) non-volatile storage with nothing further copied. You > > may have to allow for some relaxing of the semantics depending on > > what the architecture can support. > > I think you've got that backwards. memcpy_mcsafe is used to copy *from* > persistent memory. The idea is to catch errors when reading pmem, not > writing to it. > > > What's the problem with just counting bytes copied like usercopy -- > > why is that harder than cacheline accuracy? > > He said the former (i.e. bytes) is easier. So, I think you're on the > same page. :) Oh well that makes a lot more sense in my mind now, thanks :) _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-06 1:26 ` Nicholas Piggin 0 siblings, 0 replies; 36+ messages in thread From: Nicholas Piggin @ 2018-04-06 1:26 UTC (permalink / raw) To: Jeff Moyer Cc: Balbir Singh, Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Thu, 05 Apr 2018 16:40:26 -0400 Jeff Moyer <jmoyer@redhat.com> wrote: > Nicholas Piggin <npiggin@gmail.com> writes: > > > On Thu, 5 Apr 2018 15:53:07 +1000 > > Balbir Singh <bsingharora@gmail.com> wrote: > >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context > >> in the context of a machine check exception. Also, do we want to be byte > >> accurate or cache-line accurate for the bytes remaining? The former is much > >> easier than the latter :) > > > > The ideal would be a linear measure of how much of your copy reached > > (or can reach) non-volatile storage with nothing further copied. You > > may have to allow for some relaxing of the semantics depending on > > what the architecture can support. > > I think you've got that backwards. memcpy_mcsafe is used to copy *from* > persistent memory. The idea is to catch errors when reading pmem, not > writing to it. > > > What's the problem with just counting bytes copied like usercopy -- > > why is that harder than cacheline accuracy? > > He said the former (i.e. bytes) is easier. So, I think you're on the > same page. :) Oh well that makes a lot more sense in my mind now, thanks :) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-06 1:26 ` Nicholas Piggin @ 2018-04-06 9:25 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-06 9:25 UTC (permalink / raw) To: Nicholas Piggin Cc: Luck, Tony, linux-nvdimm, Michael Ellerman, Matthew Wilcox, linuxppc-dev, Christoph Hellwig On Fri, Apr 6, 2018 at 11:26 AM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 05 Apr 2018 16:40:26 -0400 > Jeff Moyer <jmoyer@redhat.com> wrote: > >> Nicholas Piggin <npiggin@gmail.com> writes: >> >> > On Thu, 5 Apr 2018 15:53:07 +1000 >> > Balbir Singh <bsingharora@gmail.com> wrote: >> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> >> in the context of a machine check exception. Also, do we want to be byte >> >> accurate or cache-line accurate for the bytes remaining? The former is much >> >> easier than the latter :) >> > >> > The ideal would be a linear measure of how much of your copy reached >> > (or can reach) non-volatile storage with nothing further copied. You >> > may have to allow for some relaxing of the semantics depending on >> > what the architecture can support. >> >> I think you've got that backwards. memcpy_mcsafe is used to copy *from* >> persistent memory. The idea is to catch errors when reading pmem, not >> writing to it. >> I know the comment in x86 says posted writes and cares for only loads, but I don't see why both sides should not be handled. >> > What's the problem with just counting bytes copied like usercopy -- >> > why is that harder than cacheline accuracy? >> >> He said the former (i.e. bytes) is easier. So, I think you're on the >> same page. :) > > Oh well that makes a lot more sense in my mind now, thanks :) I thought the cache-aligned might make sense, since usually we'd expect the failure to be at a cache-line level, but our copy_tofrom_user does accurate accounting Balbir Singh. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-06 9:25 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-06 9:25 UTC (permalink / raw) To: Nicholas Piggin Cc: Jeff Moyer, Luck, Tony, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig On Fri, Apr 6, 2018 at 11:26 AM, Nicholas Piggin <npiggin@gmail.com> wrote: > On Thu, 05 Apr 2018 16:40:26 -0400 > Jeff Moyer <jmoyer@redhat.com> wrote: > >> Nicholas Piggin <npiggin@gmail.com> writes: >> >> > On Thu, 5 Apr 2018 15:53:07 +1000 >> > Balbir Singh <bsingharora@gmail.com> wrote: >> >> I'm thinking about it, I wonder what "bytes remaining" mean in pmem context >> >> in the context of a machine check exception. Also, do we want to be byte >> >> accurate or cache-line accurate for the bytes remaining? The former is much >> >> easier than the latter :) >> > >> > The ideal would be a linear measure of how much of your copy reached >> > (or can reach) non-volatile storage with nothing further copied. You >> > may have to allow for some relaxing of the semantics depending on >> > what the architecture can support. >> >> I think you've got that backwards. memcpy_mcsafe is used to copy *from* >> persistent memory. The idea is to catch errors when reading pmem, not >> writing to it. >> I know the comment in x86 says posted writes and cares for only loads, but I don't see why both sides should not be handled. >> > What's the problem with just counting bytes copied like usercopy -- >> > why is that harder than cacheline accuracy? >> >> He said the former (i.e. bytes) is easier. So, I think you're on the >> same page. :) > > Oh well that makes a lot more sense in my mind now, thanks :) I thought the cache-aligned might make sense, since usually we'd expect the failure to be at a cache-line level, but our copy_tofrom_user does accurate accounting Balbir Singh. ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem 2018-04-06 9:25 ` Balbir Singh @ 2018-04-06 15:46 ` Luck, Tony -1 siblings, 0 replies; 36+ messages in thread From: Luck, Tony @ 2018-04-06 15:46 UTC (permalink / raw) To: Balbir Singh, Nicholas Piggin Cc: linux-nvdimm, Michael Ellerman, Matthew Wilcox, linuxppc-dev, Christoph Hellwig > I thought the cache-aligned might make sense, since usually we'd expect the > failure to be at a cache-line level, but our copy_tofrom_user does accurate > accounting That's one of the wrinkles in the current x86 memcpy_mcsafe(). It starts by checking alignment of the source address, and moves a byte at a time until it is 8-byte aligned. We do this because current x86 implementations do not gracefully handle an unaligned read that spans from a good cache-line into a poisoned one. This is different from copy_tofrom_user which aligns the destination for speed reasons (unaligned reads have a lower penalty than unaligned writes). -Tony _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem @ 2018-04-06 15:46 ` Luck, Tony 0 siblings, 0 replies; 36+ messages in thread From: Luck, Tony @ 2018-04-06 15:46 UTC (permalink / raw) To: Balbir Singh, Nicholas Piggin Cc: Jeff Moyer, Matthew Wilcox, Michael Ellerman, linux-nvdimm, linuxppc-dev, Christoph Hellwig PiBJIHRob3VnaHQgdGhlIGNhY2hlLWFsaWduZWQgbWlnaHQgbWFrZSBzZW5zZSwgc2luY2UgdXN1 YWxseSB3ZSdkIGV4cGVjdCB0aGUNCj4gZmFpbHVyZSB0byBiZSBhdCBhIGNhY2hlLWxpbmUgbGV2 ZWwsIGJ1dCBvdXIgY29weV90b2Zyb21fdXNlciBkb2VzIGFjY3VyYXRlDQo+IGFjY291bnRpbmcN Cg0KVGhhdCdzIG9uZSBvZiB0aGUgd3JpbmtsZXMgaW4gdGhlIGN1cnJlbnQgeDg2IG1lbWNweV9t Y3NhZmUoKS4gSXQgc3RhcnRzIGJ5DQpjaGVja2luZyBhbGlnbm1lbnQgb2YgdGhlIHNvdXJjZSBh ZGRyZXNzLCBhbmQgbW92ZXMgYSBieXRlIGF0IGEgdGltZSB1bnRpbA0KaXQgaXMgOC1ieXRlIGFs aWduZWQuIFdlIGRvIHRoaXMgYmVjYXVzZSBjdXJyZW50IHg4NiBpbXBsZW1lbnRhdGlvbnMgZG8g bm90DQpncmFjZWZ1bGx5IGhhbmRsZSBhbiB1bmFsaWduZWQgcmVhZCB0aGF0IHNwYW5zIGZyb20g YSBnb29kIGNhY2hlLWxpbmUgaW50bw0KYSBwb2lzb25lZCBvbmUuDQoNClRoaXMgaXMgZGlmZmVy ZW50IGZyb20gY29weV90b2Zyb21fdXNlciB3aGljaCBhbGlnbnMgdGhlIGRlc3RpbmF0aW9uIGZv ciBzcGVlZA0KcmVhc29ucyAodW5hbGlnbmVkIHJlYWRzIGhhdmUgYSBsb3dlciBwZW5hbHR5IHRo YW4gdW5hbGlnbmVkIHdyaXRlcykuDQoNCi1Ub255DQo= ^ permalink raw reply [flat|nested] 36+ messages in thread
* [RESEND 3/3] powerpc/mce: Handle memcpy_mcsafe 2018-04-04 23:19 ` Balbir Singh @ 2018-04-04 23:19 ` Balbir Singh -1 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: mpe, npiggin, linux-nvdimm Add a blocking notifier callback to be called in real-mode on machine check exceptions for UE (ld/st) errors only. The patch registers a callback on boot to be notified of machine check exceptions and returns a NOTIFY_STOP when a page of interest is seen as the source of the machine check exception. This page of interest is a ZONE_DEVICE page and hence for now, for memcpy_mcsafe to work, the page needs to belong to ZONE_DEVICE and memcpy_mcsafe should be used to access the memory. The patch also modifies the NIP of the exception context to go back to the fixup handler (in memcpy_mcsafe) and does not print any error message as the error is treated as returned via a return value and handled. Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/kernel/mce.c | 77 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 3a1226e9b465..a76638e3e47e 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -125,7 +125,8 @@ struct machine_check_event { enum MCE_UeErrorType ue_error_type:8; uint8_t effective_address_provided; uint8_t physical_address_provided; - uint8_t reserved_1[5]; + uint8_t error_return; + uint8_t reserved_1[4]; uint64_t effective_address; uint64_t physical_address; uint8_t reserved_2[8]; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index efdd16a79075..b9e4881fa8c5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -28,7 +28,9 @@ #include <linux/percpu.h> #include <linux/export.h> #include <linux/irq_work.h> +#include <linux/extable.h> +#include <asm/extable.h> #include <asm/machdep.h> #include <asm/mce.h> @@ -54,6 +56,52 @@ static struct irq_work mce_event_process_work = { DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); + +int register_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&mce_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(register_mce_notifier); + +int unregister_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&mce_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_mce_notifier); + + +static int check_memcpy_mcsafe(struct notifier_block *nb, + unsigned long val, void *data) +{ + /* + * val contains the physical_address of the bad address + */ + unsigned long pfn = val >> PAGE_SHIFT; + struct page *page = realmode_pfn_to_page(pfn); + int rc = NOTIFY_DONE; + + if (!page) + goto out; + + if (is_zone_device_page(page)) /* for HMM and PMEM */ + rc = NOTIFY_STOP; +out: + return rc; +} + +struct notifier_block memcpy_mcsafe_nb = { + .priority = 0, + .notifier_call = check_memcpy_mcsafe, +}; + +int mce_mcsafe_register(void) +{ + register_mce_notifier(&memcpy_mcsafe_nb); + return 0; +} +arch_initcall(mce_mcsafe_register); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -151,9 +199,31 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->u.ue_error.effective_address_provided = true; mce->u.ue_error.effective_address = addr; if (phys_addr != ULONG_MAX) { + int rc; + const struct exception_table_entry *entry; + + /* + * Once we have the physical address, we check to + * see if the current nip has a fixup entry. + * Having a fixup entry plus the notifier stating + * that it can handle the exception is an indication + * that we should return to the fixup entry and + * return an error from there + */ mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - machine_check_ue_event(mce); + + rc = blocking_notifier_call_chain(&mce_notifier_list, + phys_addr, NULL); + if (rc & NOTIFY_STOP_MASK) { + entry = search_exception_tables(regs->nip); + if (entry != NULL) { + mce->u.ue_error.error_return = 1; + regs->nip = extable_fixup(entry); + } else + machine_check_ue_event(mce); + } else + machine_check_ue_event(mce); } } return; @@ -208,7 +278,6 @@ void release_mce_event(void) get_mce_event(NULL, true); } - /* * Queue up the MCE event which then can be handled later. */ @@ -239,6 +308,10 @@ void machine_check_queue_event(void) if (!get_mce_event(&evt, MCE_EVENT_RELEASE)) return; + if (evt.error_type == MCE_ERROR_TYPE_UE && + evt.u.ue_error.error_return == 1) + return; + index = __this_cpu_inc_return(mce_queue_count) - 1; /* If queue is full, just return for now. */ if (index >= MAX_MC_EVT) { -- 2.13.6 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [RESEND 3/3] powerpc/mce: Handle memcpy_mcsafe @ 2018-04-04 23:19 ` Balbir Singh 0 siblings, 0 replies; 36+ messages in thread From: Balbir Singh @ 2018-04-04 23:19 UTC (permalink / raw) To: linuxppc-dev; +Cc: linux-nvdimm, mpe, npiggin, Balbir Singh Add a blocking notifier callback to be called in real-mode on machine check exceptions for UE (ld/st) errors only. The patch registers a callback on boot to be notified of machine check exceptions and returns a NOTIFY_STOP when a page of interest is seen as the source of the machine check exception. This page of interest is a ZONE_DEVICE page and hence for now, for memcpy_mcsafe to work, the page needs to belong to ZONE_DEVICE and memcpy_mcsafe should be used to access the memory. The patch also modifies the NIP of the exception context to go back to the fixup handler (in memcpy_mcsafe) and does not print any error message as the error is treated as returned via a return value and handled. Signed-off-by: Balbir Singh <bsingharora@gmail.com> --- arch/powerpc/include/asm/mce.h | 3 +- arch/powerpc/kernel/mce.c | 77 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index 3a1226e9b465..a76638e3e47e 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -125,7 +125,8 @@ struct machine_check_event { enum MCE_UeErrorType ue_error_type:8; uint8_t effective_address_provided; uint8_t physical_address_provided; - uint8_t reserved_1[5]; + uint8_t error_return; + uint8_t reserved_1[4]; uint64_t effective_address; uint64_t physical_address; uint8_t reserved_2[8]; diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index efdd16a79075..b9e4881fa8c5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -28,7 +28,9 @@ #include <linux/percpu.h> #include <linux/export.h> #include <linux/irq_work.h> +#include <linux/extable.h> +#include <asm/extable.h> #include <asm/machdep.h> #include <asm/mce.h> @@ -54,6 +56,52 @@ static struct irq_work mce_event_process_work = { DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); +static BLOCKING_NOTIFIER_HEAD(mce_notifier_list); + +int register_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&mce_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(register_mce_notifier); + +int unregister_mce_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&mce_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_mce_notifier); + + +static int check_memcpy_mcsafe(struct notifier_block *nb, + unsigned long val, void *data) +{ + /* + * val contains the physical_address of the bad address + */ + unsigned long pfn = val >> PAGE_SHIFT; + struct page *page = realmode_pfn_to_page(pfn); + int rc = NOTIFY_DONE; + + if (!page) + goto out; + + if (is_zone_device_page(page)) /* for HMM and PMEM */ + rc = NOTIFY_STOP; +out: + return rc; +} + +struct notifier_block memcpy_mcsafe_nb = { + .priority = 0, + .notifier_call = check_memcpy_mcsafe, +}; + +int mce_mcsafe_register(void) +{ + register_mce_notifier(&memcpy_mcsafe_nb); + return 0; +} +arch_initcall(mce_mcsafe_register); + static void mce_set_error_info(struct machine_check_event *mce, struct mce_error_info *mce_err) { @@ -151,9 +199,31 @@ void save_mce_event(struct pt_regs *regs, long handled, mce->u.ue_error.effective_address_provided = true; mce->u.ue_error.effective_address = addr; if (phys_addr != ULONG_MAX) { + int rc; + const struct exception_table_entry *entry; + + /* + * Once we have the physical address, we check to + * see if the current nip has a fixup entry. + * Having a fixup entry plus the notifier stating + * that it can handle the exception is an indication + * that we should return to the fixup entry and + * return an error from there + */ mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; - machine_check_ue_event(mce); + + rc = blocking_notifier_call_chain(&mce_notifier_list, + phys_addr, NULL); + if (rc & NOTIFY_STOP_MASK) { + entry = search_exception_tables(regs->nip); + if (entry != NULL) { + mce->u.ue_error.error_return = 1; + regs->nip = extable_fixup(entry); + } else + machine_check_ue_event(mce); + } else + machine_check_ue_event(mce); } } return; @@ -208,7 +278,6 @@ void release_mce_event(void) get_mce_event(NULL, true); } - /* * Queue up the MCE event which then can be handled later. */ @@ -239,6 +308,10 @@ void machine_check_queue_event(void) if (!get_mce_event(&evt, MCE_EVENT_RELEASE)) return; + if (evt.error_type == MCE_ERROR_TYPE_UE && + evt.u.ue_error.error_return == 1) + return; + index = __this_cpu_inc_return(mce_queue_count) - 1; /* If queue is full, just return for now. */ if (index >= MAX_MC_EVT) { -- 2.13.6 ^ permalink raw reply related [flat|nested] 36+ messages in thread
end of thread, other threads:[~2018-05-02 12:36 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-04 23:19 [RESEND 0/3] Add support for memcpy_mcsafe Balbir Singh 2018-04-04 23:19 ` Balbir Singh 2018-04-04 23:19 ` [RESEND 1/3] powerpc/mce: Bug fixes for MCE handling in kernel space Balbir Singh 2018-04-04 23:19 ` Balbir Singh 2018-04-04 23:49 ` Nicholas Piggin 2018-04-04 23:49 ` Nicholas Piggin 2018-04-05 1:11 ` Balbir Singh 2018-04-05 1:11 ` Balbir Singh 2018-04-04 23:19 ` [RESEND 2/3] powerpc/memcpy: Add memcpy_mcsafe for pmem Balbir Singh 2018-04-04 23:19 ` Balbir Singh 2018-04-04 23:57 ` Nicholas Piggin 2018-04-04 23:57 ` Nicholas Piggin 2018-04-05 3:00 ` Dan Williams 2018-04-05 3:00 ` Dan Williams 2018-04-05 5:04 ` Nicholas Piggin 2018-04-05 5:04 ` Nicholas Piggin 2018-04-05 5:53 ` Balbir Singh 2018-04-05 5:53 ` Balbir Singh 2018-04-05 6:45 ` Nicholas Piggin 2018-04-05 6:45 ` Nicholas Piggin 2018-04-05 15:00 ` Dan Williams 2018-04-05 15:00 ` Dan Williams 2018-05-01 20:57 ` Dan Williams 2018-05-01 20:57 ` Dan Williams 2018-05-02 12:36 ` Balbir Singh 2018-05-02 12:36 ` Balbir Singh 2018-04-05 20:40 ` Jeff Moyer 2018-04-05 20:40 ` Jeff Moyer 2018-04-06 1:26 ` Nicholas Piggin 2018-04-06 1:26 ` Nicholas Piggin 2018-04-06 9:25 ` Balbir Singh 2018-04-06 9:25 ` Balbir Singh 2018-04-06 15:46 ` Luck, Tony 2018-04-06 15:46 ` Luck, Tony 2018-04-04 23:19 ` [RESEND 3/3] powerpc/mce: Handle memcpy_mcsafe Balbir Singh 2018-04-04 23:19 ` Balbir Singh
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.