From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753077AbdC2F5b (ORCPT ); Wed, 29 Mar 2017 01:57:31 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:54488 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752750AbdC2F52 (ORCPT ); Wed, 29 Mar 2017 01:57:28 -0400 Date: Wed, 29 Mar 2017 06:57:06 +0100 From: Al Viro To: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Linus Torvalds , Richard Henderson , Russell King , Will Deacon , Haavard Skinnemoen , Vineet Gupta , Steven Miao , Jesper Nilsson , Mark Salter , Yoshinori Sato , Richard Kuo , Tony Luck , Geert Uytterhoeven , James Hogan , Michal Simek , David Howells , Ley Foon Tan , Jonas Bonn , Helge Deller , Martin Schwidefsky , Ralf Baechle , Benjamin Herrenschmidt , Chen Liqin , "David S. Miller" , Chris Metcalf , Richard Weinberger , Guan Xuetao , Thomas Gleixner , Chris Zankel Subject: [RFC][CFT][PATCHSET v1] uaccess unification Message-ID: <20170329055706.GH29622@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have several primitives for bulk kernel<->userland copying. That stuff lives in various asm/uaccess.h, with serious code duplication _and_ seriously inconsistent semantics. That code has grown a lot of cruft and more than a few bugs. Some got caught and fixed last year, but some fairly unpleasant ones still remain. A large part of problem was that a lot of code used to include directly, so we had no single place to work with. That got finally fixed in 4.10-rc1, when everything had been forcibly switched to including . At that point it became possible to start getting rid of boilerplate; I hoped to deal with that by 4.11-rc1, but the things didn't work out and that work has slipped to this cycle. The patchset currently in vfs.git#work.uaccess is the result; there's more work to do, but it takes care of a large part of the problems. About 2.8KLoc removed, a lot of cruft is gone and semantics is hopefully in sync now. All but two architectures (ia64 and metag) had been switched to new mechanism; for these two I'm afraid that I'll need serious help from maintainers. Currently we have 8 primitives - 6 on every architecture and 2 more on biarch ones. All of them have the same calling conventions: arguments are the same as for memcpy() (void *to, const void *from, unsigned long size) and the same rules for return value. If all loads and stores succeed, everything is obvious - the 'size' bytes starting at 'to' become equal to 'size' bytes starting at 'from' and zero is returned. If some loads or stores fail, non-zero value should be returned. If any of those primitives returns a positive value N, * N should be no greater than size * the values fetched out of from[0..size-N-1] should be stored into the corresponding bytes of to[0..size-N-1] * N should not be equal to size unless not a single byte could have been fetched or stored. As long as that restriction is satisfied, these primitives are not required to squeeze every possible byte in case some loads or stores fail. 1) copy_from_user() - 'to' points to kernel memory, 'from' is normally a userland pointer. This is used for copying structures from userland in all kinds of ioctls, etc. No faults on access to destination are allowed, faults on access to source lead to zero-padding the rest of destination. Note that for architectures with the same address space split between the kernel and userland (i.e. the ones that have non-trivial access_ok()) passing a kernel address instead of a userland one should be treated as 'every access would fail'. In such cases the entire destination should be zeroed (failure to do so was a fairly common bug). Note that all these functions, including copy_from_user(), are affected by set_fs() - when called under set_fs(KERNEL_DS), they expect kernel pointers where normally a userland one would be given. 2) copy_to_user() - 'from' points to kernel memory, 'to' is a userland pointer (subject to set_fs() effects, as usual). Again. this is used by all kinds of code in all kinds of drivers, syscalls, etc. No faults on access to source, fault on access to destination terminates copying. No zero-padding, of course - the faults are going to be on store here. Does not assume that access_ok() had been checked by caller; given 'to'/'size' that fails access_ok() returns "nothing copied". 3) copy_in_user() - both 'from' and 'to' are in userland. Used only by compat code that needs to repack 32bit data structures into native 64bit counterparts. As the result, provided only by biarch architectures. Subject to set_fs(), but really should not be (and AFAICS isn't) used that way. Some architectures tried to zero-pad, but did it inconsistently and it's pointless anyway - destination is in userland memory, so no infoleaks would happen. 4) __copy_from_user_inatomic() - similar to copy_from_user(), except that * the caller is presumed to have verified that the source range passes access_ok() [note that this is does not guarantee the lack of faults] * most importantly, zero-padding SHOULD NOT happen on short copy. If implementation fails to guarantee that, it's a bug and potentially bad one[1]. * it may be called with pagefaults disabled (of course, in that case any pagefault results in a short copy). That's what 'inatomic' in the name refers to. Note that actually disabling pagefaults is up to the caller; blindly calling it e.g. from under a spinlock will just get you a deadlock. Even more precautions are needed to call it from something like an interrupt handler - you must do that under set_fs(), etc. It's not "this variant is safe to call from atomic contexts", it's "I know what I'm doing, don't worry if you see it in an atomic context". 5) __copy_to_user_inatomic(). A counterpart of __copy_from_user_inatomic(), except for the direction of copying. 6) __copy_from_user(). Essentially the only difference from __copy_from_user_inatomic() is that one isn't supposed to call it from atomic contexts. It may be marginally faster than copy_from_user() (due to skipped access_ok()), but these days the main costs are not in doing fairly light arithmetics. In theory, you might do a single access_ok() covering a large structure and then proceed to call __copy_from_user() on various parts of that. In practice doing many calls of that thing on small chunks of data is going to cost a lot on current x86 boxen due to STAC/CLAC pair inside each call. Has fewer call sites than copy_from_user() - copy_from_user() is in thousands, while this one has only 40 callers outside of arch/, some fairly dubious. In arch there's about 170 callers total, mostly in sigreturn instances. 7) __copy_to_user(). A counterpart of __copy_from_user(), with pretty much the same considerations applied. 8) __copy_in_user(). Basically, copy_in_user() sans access_ok(). Biarch-only, with the grand total of 6 callers... What this series does is: * convert architectures to fewer primitives (raw_copy_{to,from,in}_user(), the last one only on biarch ones), switching to generic implementations of the 8 primitives aboves via raw_... ones. Those generic implementations are in linux/uaccess.h (and lib/usercopy.c). Architecture provides raw_... ones, selects ARCH_HAS_RAW_COPY_USER and it's done. * all object size check, kasan, etc. instrumentation is taken care of in linux/uaccess.h; no need to touch it in arch/* * consistent semantics wrt zero-padding - none of the raw_... do any of that, copy_from_user() does (outside of fast path). At the moment I have that conversion done for everything except ia64 and metag. Once everything is converted, I'll remove ARCH_HAS_RAW_COPY_USER and make generic stuff unconditional; at the same point HAVE_ARCH_HARDENED_USERCOPY will be gone (becoming unconditionally true). The series lives in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git in #work.uaccess. It's based at 4.11-rc1. Infrastructure is in #uaccess.stem, then it splits into per-architecture branches (uaccess.), eventually merged into #work.uaccess. Some stuff (including a cherry-picked mips build fix) is in #uaccess.misc, also merged into the final. I hope that infrastructure part is stable enough to put it into never-rebased state. Some of per-architecture branches might be even done right; however, most of them got no testing whatsoever, so any help with testing (as well as "Al, for fuck sake, dump that garbage of yours, here's the correct patch" from maintainers) would be very welcome. So would the review, of course. In particular, the fix in uaccess.parisc should be replaced with the stuff Helge posted on parisc list, probably along with the get_user/put_user patches. I've put my variant of fix there as a stopgap; switch of pa_memcpy() to assembler is clearly the right way to solve it and I'll be happy to switch to that as soon as parisc folks settle on the final version of that stuff. For most of the oddball architectures I have no way to test that stuff, so please treat the asm-affecting patches in there as a starting point for doing it right. Some might even work as is - stranger things had happened, but don't count ont it. And again, metag and ia64 parts are simply not there - both architectures zero-pad in __copy_from_user_inatomic() and that really needs fixing. In case of metag there's __copy_to_user() breakage as well, AFAICS, and I've been unable to find any documentation describing the architecture wrt exceptions, and that part is apparently fairly weird. In case of ia64... I can test mckinley side of things, but not the generic __copy_user() and ia64 is about as weird as it gets. With no reliable emulator, at that... So these two are up to respective maintainers. Other things not there: * unification of strncpy_from_user() and friends. Probably next cycle. * anything to do with uaccess_begin/unsafe accesses/uaccess_end stuff. Definitely next cycle. I'm not sure if mailbombing linux-arch would be a good idea; there are 90 patches in that pile, with total size nearly half a megabyte. If anyone wants that posted, I'll do so, but it might be more convenient to just use git. Comments, review, testing, replacement patches, etc. are very welcome. Al "hates assembers, dozens of them" Viro [1] Nick Piggin has spotted that bug back in early 2000s, fixed it for i386 and hadn't bothered to do anything about other architectures (including amd64, for crying out loud!). Since then we had inconsistent behaviour between the architectures. Results of those bugs range from transient bogus values observed in mmap() if you get memory pressure combined with bad timing to outright fs corruption, if the timing is *really* bad. All architectures used to have it, hopefully this series will take care of the last stragglers.