linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Dave Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>, Peter Anvin <hpa@zytor.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Linux-MM <linux-mm@kvack.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	intel-gfx@lists.freedesktop.org,
	Russell King <linux@armlinux.org.uk>
Subject: Re: [PATCH v2 5/5] uaccess: Rename user_access_begin/end() to user_full_access_begin/end()
Date: Tue, 21 Apr 2020 03:49:19 +0100	[thread overview]
Message-ID: <20200421024919.GA23230@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20200403205205.GK23230@ZenIV.linux.org.uk>

[rmk Cc'd]
On Fri, Apr 03, 2020 at 09:52:05PM +0100, Al Viro wrote:

> I can do a 5.7-rc1-based branch with that; depending upon what we end
> up doing for arm and s390 we can always change the calling conventions
> come next cycle ;-/
> 
> My impressions after digging through arm side of things:
> 
> 1) the only instance of nesting I'd found there (so far) is a mistake.
> The rule should be "no fucking nesting, TYVM".

OK, after quite a bit of digging:
	1) everything outside of arm is quite happy with not passing
anything to user_access_end().  s390 is a red herring in that respect.
	2) on arm we definitely can get rid of nesting.  However,
there are some unpleasant sides of the logics in there.  What we have
is an MMU register; everything except for two 2bit fields in it is
constant.  One of those fields is a function of get_fs(), another might
serve an analogue of x86 EFLAGS.AC.  Rules:

	DACR.USER is 0 if CONFIG_SW_DOMAIN_PAN is enabled and we are
*not* in uaccess section; otherwise it's 1.
	DACR.KERNEL is 3 if CONFIG_USE_DOMAINS is enabled and we are
under KERNEL_DS; otherwise it's 1.

[USE_DOMAINS is forced to "yes" on v5 and earlier, configurable on v6+]
[SW_DOMAIN_PAN is forced to "no" on v7 if we want support of huge physical
space, configurable with default to "yes" otherwise]

	On entry into kernel we get into USER_DS state before we get
out of asm glue.  Original settings are restored on return.  That goes
both for ->addr_limit (get_fs() value) and for DACR.KERNEL contents.
DACR.USER ("uaccess allowed") is switched to "disabled" state before
we reach C code and restored on return from kernel.

	The costs are interesting; setting the register is costly, in
the same manner STAC/CLAC is.  Reading it... hell knows; I don't see any
explicit information about that.  As it is, both set_fs() and starting
uaccess block (uaccess_save_and_enable() - the thing that would've
gone into user_access_begin()) do both read and write to register; with
minimal massage we could get rid of reading the damn thing in set_fs().
user_access_end() candidate does a plain write to register, with value
kept around since the beginning of uaccess block.

	*IF* read from that register is cheap, we can trivially get rid
of passing the cookie there - it's a matter of reading the register
and clearing one bit in it before writing it back.  If that is costly,
though...  We can easily calculate it from ->addr_limit, which we already
have in cache at that point, or will need shortly anyway.  In that
case it would probably make sense to do the same to user_access_begin()
and set_fs().  Note that I'm not suggesting to do anything of that sort
in switch_to() - existing mechanism doesn't need any changes, and neither
does the asm glue in entry*.S.

	The only source I'd been able to find speeks of >= 60 cycles
(and possibly much more) for non-pipelined coprocessor instructions;
the list of such does contain loads and stores to a bunch of registers.
However, the register in question (p15/c3) has only store mentioned there,
so loads might be cheap; no obvious reasons for those to be slow.
That's a question to arm folks, I'm afraid...  rmk?

	Note that we can keep the current variant (i.e.
user_access_begin() being just the check for access_ok(), user_access_end()
being empty and uaccess_save_and_enable()/uaccess_restore() done manually
inside the primitives); after all, a lot of architectures don't _have_
anything of that sort.  It's just that decisions regarding the calling
conventions for these primitives will be much harder to change later on...

	Again, arm (32bit one) is the only architectures that has something
of that sort and needs to pass cookie from beginning to the end of uaccess
blocks.  Everything else splits into several classes:

1) has MMU, shared address space for kernel/userland, no stac analogues.
        alpha, arc, csky, hexagon, itanic, nds32, nios32, openrisc, sh,
        sparc32, unicore32, xtensa/MMU, microblaze/MMU, mips/MMU,
        m68k/MMU/COLDFIRE.
No way to do anything other than plain access_ok() for user_access_begin().

2) has MMU, shared address space for kernel/userland, has stac analogue,
possibly with separate "for read" and "for write" variants.  Can live
without passing any cookies.
        arm64, powerpc, riscv, x86
Current variant with changes in this patchset covers those.

3) non-MMU, uses memcpy() for everything, or at least ought to:
        c6x, h8300, m68k/!MMU, xtensa/!MMU(?), microblaze/!MMU(?), mips/!MMU(?),
        arm/!MMU
No memory protection of any sort...

4) sparc-like: MMU, separate address spaces for userland and kernel,
has explicit insns for uaccess + some register(s) to choose what those
insns actually hit.
        sparc64, parisc, m68k/MMU/!COLDFIRE
No stac/clac analogue would make sense.

5) s390: weird one - there is an stac analogue as far the hardware is
        concerned, but it can't be separated from inline asm where
        actual uaccess insns are.  From the kernel POV it's sparc-like.
Nothing that would reasonably map to user_access_begin/user_access_end

6) um: no uaccess, in a sense of dereferencing non-kernel pointers.  What
it does is simulation of page table walk + explicit call of #PF handler
on missing pages + kmap_atomic() to get a kernel alias.


  reply	other threads:[~2020-04-21  2:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  7:20 [PATCH v2 1/5] uaccess: Add user_read_access_begin/end and user_write_access_begin/end Christophe Leroy
2020-04-03  7:20 ` [PATCH v2 2/5] uaccess: Selectively open read or write user access Christophe Leroy
2020-05-29  4:20   ` Michael Ellerman
2020-04-03  7:20 ` [PATCH v2 3/5] drm/i915/gem: Replace user_access_begin by user_write_access_begin Christophe Leroy
2020-05-29  4:20   ` Michael Ellerman
2020-04-03  7:20 ` [PATCH v2 4/5] powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin Christophe Leroy
2020-05-29  4:24   ` Michael Ellerman
2020-04-03  7:20 ` [PATCH v2 5/5] uaccess: Rename user_access_begin/end() to user_full_access_begin/end() Christophe Leroy
2020-04-03 18:01   ` Linus Torvalds
2020-04-03 20:52     ` Al Viro
2020-04-21  2:49       ` Al Viro [this message]
2020-04-21  9:12         ` Russell King - ARM Linux admin
2020-04-21 18:34         ` Linus Torvalds
2020-04-05 18:47     ` Christophe Leroy
2020-04-04  6:20   ` kbuild test robot
2020-04-04  7:17   ` kbuild test robot
2020-05-29  4:20 ` [PATCH v2 1/5] uaccess: Add user_read_access_begin/end and user_write_access_begin/end Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200421024919.GA23230@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=daniel@ffwll.ch \
    --cc=hpa@zytor.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).