On 2019-10-10, Michael Ellerman wrote: > Aleksa Sarai writes: > > A common pattern for syscall extensions is increasing the size of a > > struct passed from userspace, such that the zero-value of the new fields > > result in the old kernel behaviour (allowing for a mix of userspace and > > kernel vintages to operate on one another in most cases). > > > > While this interface exists for communication in both directions, only > > one interface is straightforward to have reasonable semantics for > > (userspace passing a struct to the kernel). For kernel returns to > > userspace, what the correct semantics are (whether there should be an > > error if userspace is unaware of a new extension) is very > > syscall-dependent and thus probably cannot be unified between syscalls > > (a good example of this problem is [1]). > > > > Previously there was no common lib/ function that implemented > > the necessary extension-checking semantics (and different syscalls > > implemented them slightly differently or incompletely[2]). Future > > patches replace common uses of this pattern to make use of > > copy_struct_from_user(). > > > > Some in-kernel selftests that insure that the handling of alignment and > > various byte patterns are all handled identically to memchr_inv() usage. > ... > > diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c > > index 67bcd5dfd847..950ee88cd6ac 100644 > > --- a/lib/test_user_copy.c > > +++ b/lib/test_user_copy.c > > @@ -31,14 +31,133 @@ > ... > > +static int test_check_nonzero_user(char *kmem, char __user *umem, size_t size) > > +{ > > + int ret = 0; > > + size_t start, end, i; > > + size_t zero_start = size / 4; > > + size_t zero_end = size - zero_start; > > + > > + /* > > + * We conduct a series of check_nonzero_user() tests on a block of memory > > + * with the following byte-pattern (trying every possible [start,end] > > + * pair): > > + * > > + * [ 00 ff 00 ff ... 00 00 00 00 ... ff 00 ff 00 ] > > + * > > + * And we verify that check_nonzero_user() acts identically to memchr_inv(). > > + */ > > + > > + memset(kmem, 0x0, size); > > + for (i = 1; i < zero_start; i += 2) > > + kmem[i] = 0xff; > > + for (i = zero_end; i < size; i += 2) > > + kmem[i] = 0xff; > > + > > + ret |= test(copy_to_user(umem, kmem, size), > > + "legitimate copy_to_user failed"); > > + > > + for (start = 0; start <= size; start++) { > > + for (end = start; end <= size; end++) { > > + size_t len = end - start; > > + int retval = check_zeroed_user(umem + start, len); > > + int expected = is_zeroed(kmem + start, len); > > + > > + ret |= test(retval != expected, > > + "check_nonzero_user(=%d) != memchr_inv(=%d) mismatch (start=%zu, end=%zu)", > > + retval, expected, start, end); > > + } > > + } > > This is causing soft lockups for me on powerpc, eg: > > [ 188.208315] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [modprobe:611] > [ 188.208782] Modules linked in: test_user_copy(+) vmx_crypto gf128mul crc32c_vpmsum virtio_balloon ip_tables x_tables autofs4 > [ 188.209594] CPU: 4 PID: 611 Comm: modprobe Tainted: G L 5.4.0-rc1-gcc-8.2.0-00001-gf5a1a536fa14-dirty #1151 > [ 188.210392] NIP: c000000000173650 LR: c000000000379cb0 CTR: c0000000007b20d0 > [ 188.210612] REGS: c0000000ec213560 TRAP: 0901 Tainted: G L (5.4.0-rc1-gcc-8.2.0-00001-gf5a1a536fa14-dirty) > [ 188.210876] MSR: 8000000000009033 CR: 28222422 XER: 20000000 > [ 188.211060] CFAR: c000000000379cac IRQMASK: 0 > [ 188.211060] GPR00: c000000000379cb0 c0000000ec2137f0 c0000000013bbb00 c000000000f527f0 > [ 188.211060] GPR04: 000000000000004b 0000000000000000 00000000000085f5 c00000000fffb780 > [ 188.211060] GPR08: 0000000000000000 0000000000000000 c0000000fb9a3080 c008000000411478 > [ 188.211060] GPR12: c0000000007b20d0 c00000000fffb780 > [ 188.211802] NIP [c000000000173650] __might_sleep+0x20/0xc0 > [ 188.211924] LR [c000000000379cb0] __might_fault+0x40/0x60 > [ 188.212037] Call Trace: > [ 188.212101] [c0000000ec2137f0] [c0000000001b99b4] vprintk_func+0xc4/0x230 (unreliable) > [ 188.212274] [c0000000ec213810] [c0000000007b21fc] check_zeroed_user+0x12c/0x200 > [ 188.212478] [c0000000ec213860] [c0080000004106cc] test_user_copy_init+0x67c/0x1210 [test_user_copy] > [ 188.212681] [c0000000ec2139a0] [c000000000010440] do_one_initcall+0x60/0x340 > [ 188.212859] [c0000000ec213a70] [c000000000213d4c] do_init_module+0x7c/0x2f0 > [ 188.213004] [c0000000ec213b00] [c000000000216f24] load_module+0x2d94/0x30e0 > [ 188.213150] [c0000000ec213d00] [c000000000217578] __do_sys_finit_module+0xc8/0x150 > [ 188.213350] [c0000000ec213e20] [c00000000000b5d8] system_call+0x5c/0x68 > [ 188.213494] Instruction dump: > [ 188.213587] 409efec0 4e800020 60000000 60000000 3c4c0125 384284d0 7c0802a6 60000000 > [ 188.213767] fba1ffe8 fbc1fff0 fbe1fff8 7c9e2378 7c7f1b78 7cbd2b78 e94d0958 > > > I think it's partly because I have DEBUG_ATOMIC_SLEEP enabled, which > means each unsafe_get_user() calls __might_fault() etc. > > But even turning that off, it still takes forever. > > > @@ -106,6 +225,11 @@ static int __init test_user_copy_init(void) > > #endif > > #undef test_legit > > > > + /* Test usage of check_nonzero_user(). */ > > + ret |= test_check_nonzero_user(kmem, usermem, 2 * PAGE_SIZE); > > I suspect it's just that PAGE_SIZE for me is 64K, and so the nested loop > above gets too big too fast. > > If my math is right it's doing about 500 million iterations, vs ~2 > million on a 4K kernel. > > If I do the change below the entire test_user_copy module loads and runs > all the tests in about 10 seconds. > > diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c > index 950ee88cd6ac..03b617a36144 100644 > --- a/lib/test_user_copy.c > +++ b/lib/test_user_copy.c > @@ -226,7 +226,7 @@ static int __init test_user_copy_init(void) > #undef test_legit > > /* Test usage of check_nonzero_user(). */ > - ret |= test_check_nonzero_user(kmem, usermem, 2 * PAGE_SIZE); > + ret |= test_check_nonzero_user(kmem, usermem, 2 * 4096); > /* Test usage of copy_struct_from_user(). */ > ret |= test_copy_struct_from_user(kmem, usermem, 2 * PAGE_SIZE); > > > How long does it take on your systems? Is 10s in the ball park, or is > there something else pathological happening on my machine, and shrinking > it to 4096 is just papering over it? Yeah, it takes about 5-10s on my laptop. We could switch it to just everything within a 4K block, but the main reason for testing with 2*PAGE_SIZE is to make sure that check_nonzero_user() works across page boundaries. Though we could only do check_nonzero_user() in the region of the page boundary (maybe i E (PAGE_SIZE-512,PAGE_SIZE+512]?) Making a single test run for ~40min doesn't seem like that good of an idea in retrospect. :P -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH