linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Christoph Hellwig <hch@lst.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Michael Ellerman <mpe@ellerman.id.au>,
	x86@kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Kees Cook <keescook@chromium.org>,
	linux-kernel@vger.kernel.org
Subject: Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v2
Date: Tue, 1 Sep 2020 19:13:00 +0200	[thread overview]
Message-ID: <a8bb0319-0928-4687-9e9c-777c5860dbdd@csgroup.eu> (raw)
In-Reply-To: <20200827150030.282762-1-hch@lst.de>

Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
> 
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced.  For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them.  The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it.  All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
> 
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged.  The process to convert
> architectures is roughtly:
> 
>   (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
>   (2) implement __get_kernel_nofault and __put_kernel_nofault
>   (3) remove the arch specific address limitation functionality
> 
> Changes since v1:
>   - drop the patch to remove the non-iter ops for /dev/zero and
>     /dev/null as they caused a performance regression
>   - don't enable user access in __get_kernel on powerpc
>   - xfail the set_fs() based lkdtm tests
> 
> Diffstat:
> 


I'm still sceptic with the results I get.

With 5.9-rc2:

root@vgoippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real    0m 5.59s
user    0m 1.40s
sys     0m 4.19s


With your series:

root@vgoippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real    0m 7.79s
user    0m 2.12s
sys     0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

     20.31%  dd       [kernel.kallsyms]  [k] __arch_clear_user
      8.37%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      7.37%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      6.95%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      5.72%  dd       [kernel.kallsyms]  [k] new_sync_read
      4.87%  dd       [kernel.kallsyms]  [k] vfs_write
      4.47%  dd       [kernel.kallsyms]  [k] vfs_read
      3.07%  dd       [kernel.kallsyms]  [k] ksys_write
      2.77%  dd       [kernel.kallsyms]  [k] ksys_read
      2.65%  dd       [kernel.kallsyms]  [k] __fget_light
      2.37%  dd       [kernel.kallsyms]  [k] __fdget_pos
      2.35%  dd       [kernel.kallsyms]  [k] memset
      1.53%  dd       [kernel.kallsyms]  [k] rw_verify_area
      1.52%  dd       [kernel.kallsyms]  [k] read_iter_zero

With your series:
     19.60%  dd       [kernel.kallsyms]  [k] __arch_clear_user
     10.92%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      9.50%  dd       [kernel.kallsyms]  [k] vfs_write
      8.97%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      5.46%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      5.42%  dd       [kernel.kallsyms]  [k] vfs_read
      3.58%  dd       [kernel.kallsyms]  [k] ksys_read
      2.84%  dd       [kernel.kallsyms]  [k] read_iter_zero
      2.24%  dd       [kernel.kallsyms]  [k] ksys_write
      1.80%  dd       [kernel.kallsyms]  [k] __fget_light
      1.34%  dd       [kernel.kallsyms]  [k] __fdget_pos
      0.91%  dd       [kernel.kallsyms]  [k] memset
      0.91%  dd       [kernel.kallsyms]  [k] rw_verify_area

Christophe

  parent reply	other threads:[~2020-09-01 17:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27 15:00 remove the last set_fs() in common code, and remove it for x86 and powerpc v2 Christoph Hellwig
2020-08-27 15:00 ` [PATCH 01/10] fs: don't allow kernel reads and writes without iter ops Christoph Hellwig
2020-08-27 15:58   ` David Laight
2020-08-29  9:23     ` 'Christoph Hellwig'
     [not found]   ` <20200901064849.GI4299@shao2-debian>
2020-09-01  7:08     ` [fs] ef30fb3c60: kernel write not supported for file /sys/kernel/softlockup_panic Christoph Hellwig
2020-08-27 15:00 ` [PATCH 02/10] fs: don't allow splice read/write without explicit ops Christoph Hellwig
2020-08-27 15:00 ` [PATCH 03/10] uaccess: add infrastructure for kernel builds with set_fs() Christoph Hellwig
2020-08-27 15:00 ` [PATCH 04/10] test_bitmap: skip user bitmap tests for !CONFIG_SET_FS Christoph Hellwig
2020-08-27 15:00 ` [PATCH 05/10] lkdtm: disable set_fs-based " Christoph Hellwig
2020-08-27 18:06   ` Linus Torvalds
2020-08-29  9:24     ` Christoph Hellwig
2020-09-01 18:52       ` Kees Cook
2020-09-01 18:57       ` Kees Cook
2020-09-02  8:09         ` Christoph Hellwig
2020-08-27 15:00 ` [PATCH 06/10] x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h Christoph Hellwig
2020-08-27 15:00 ` [PATCH 07/10] x86: make TASK_SIZE_MAX usable from assembly code Christoph Hellwig
2020-08-27 15:00 ` [PATCH 08/10] x86: remove address space overrides using set_fs() Christoph Hellwig
2020-08-27 18:15   ` Linus Torvalds
2020-08-29  9:25     ` Christoph Hellwig
2020-08-27 15:00 ` [PATCH 09/10] powerpc: use non-set_fs based maccess routines Christoph Hellwig
2020-08-27 15:00 ` [PATCH 10/10] powerpc: remove address space overrides using set_fs() Christoph Hellwig
2020-09-02  6:15   ` Christophe Leroy
2020-09-02 12:36     ` Christoph Hellwig
2020-09-02 13:13       ` David Laight
2020-09-02 13:24         ` Christophe Leroy
2020-09-02 13:51           ` David Laight
2020-09-02 14:12             ` Christophe Leroy
2020-09-02 15:02               ` David Laight
2020-09-02 15:17       ` Christophe Leroy
2020-09-02 18:02         ` Linus Torvalds
2020-09-03  7:11           ` Christoph Hellwig
2020-09-03  7:27             ` Christophe Leroy
2020-09-03  8:55             ` Christophe Leroy
2020-09-03  7:20           ` Christophe Leroy
2020-08-27 15:31 ` remove the last set_fs() in common code, and remove it for x86 and powerpc v2 Christoph Hellwig
2020-09-01 17:13 ` Christophe Leroy [this message]
2020-09-01 17:25   ` Al Viro
2020-09-01 17:42     ` Matthew Wilcox
2020-09-01 18:39     ` Christophe Leroy
2020-09-01 19:01     ` Christophe Leroy
2020-09-02  8:10     ` Christoph Hellwig
2020-10-27  9:29 ` [PATCH 02/10] fs: don't allow splice read/write without explicit ops David Howells
2020-10-27  9:51 ` David Howells
2020-10-27  9:54   ` Christoph Hellwig
2020-10-27 10:38   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8bb0319-0928-4687-9e9c-777c5860dbdd@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=hch@lst.de \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).