On Thu, Mar 29, 2018 at 10:28:24AM +0100, Matt Redfearn wrote: > The __clear_user function is defined to return the number of bytes that > could not be cleared. From the underlying memset / bzero implementation > this means setting register a2 to that number on return. Currently if a > page fault is triggered within the memset_partial block, the value > loaded into a2 on return is meaningless. > > The label .Lpartial_fixup\@ is jumped to on page fault. Currently it > masks the remaining count of bytes (a2) with STORMASK, meaning that the > least significant 2 (32bit) or 3 (64bit) bits of the remaining count are > always clear. Are you sure about that. It seems to do that *to ensure those bits are set correctly*... > Secondly, .Lpartial_fixup\@ expects t1 to contain the end address of the > copy. This is set up by the initial block: > PTR_ADDU t1, a0 /* end address */ > However, the .Lmemset_partial\@ block then reuses register t1 to > calculate a jump through a block of word copies. This leaves it no > longer containing the end address of the copy operation if a page fault > occurs, and the remaining bytes calculation is incorrect. > > Fix these issues by removing the and of a2 with STORMASK, and replace t1 > with register t2 in the .Lmemset_partial\@ block. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Cc: stable@vger.kernel.org > Signed-off-by: Matt Redfearn > --- > > arch/mips/lib/memset.S | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/mips/lib/memset.S b/arch/mips/lib/memset.S > index 90bcdf1224ee..3257dca58cad 100644 > --- a/arch/mips/lib/memset.S > +++ b/arch/mips/lib/memset.S > @@ -161,19 +161,19 @@ > > .Lmemset_partial\@: > R10KCBARRIER(0(ra)) > - PTR_LA t1, 2f /* where to start */ > + PTR_LA t2, 2f /* where to start */ > #ifdef CONFIG_CPU_MICROMIPS > LONG_SRL t7, t0, 1 Hmm, on microMIPS t7 isn't on the clobber list for __bzero, and nor is t8... > #endif > #if LONGSIZE == 4 > - PTR_SUBU t1, FILLPTRG > + PTR_SUBU t2, FILLPTRG > #else > .set noat > LONG_SRL AT, FILLPTRG, 1 > - PTR_SUBU t1, AT > + PTR_SUBU t2, AT > .set at > #endif > - jr t1 > + jr t2 > PTR_ADDU a0, t0 /* dest ptr */ ^^^ note this... > > .set push > @@ -250,7 +250,6 @@ > > .Lpartial_fixup\@: > PTR_L t0, TI_TASK($28) > - andi a2, STORMASK ... this isn't right. If I read correctly, t1 (after the above change stops clobbering it) is the end of the full 64-byte blocks, i.e. the start address of the final partial block. The .Lfwd_fixup calculation (for full blocks) appears to be: a2 = ((len & 0x3f) + start_of_partial) - badvaddr which is spot on. (len & 0x3f) is the partial block and remaining bytes that haven't been set yet, add start_of_partial to get end of the full range, subtract bad address to find how much didn't copy. The calculation for .Lpartial_fixup however appears to (currently) do: a2 = ((len & STORMASK) + start_of_partial) - badvaddr Which might make sense if start_of_partial (t1) was replaced with end_of_partial, which does seem to be calculated as noted above, and put in a0 ready for the final few bytes to be set. > LONG_L t0, THREAD_BUADDR(t0) > LONG_ADDU a2, t1 ^^ So I think either it needs to just s/t1/a0/ here and not bother preserving t1 above (smaller change and probably the original intent), or preserve t1 and mask 0x3f instead of STORMASK like .Lfwd_fixup does (which would work but seems needlessly complicated to me). Does that make any sense or have I misunderstood some subtlety? Cheers James > jr ra > -- > 2.7.4 >