* [GIT PULL] Please pull NFS client fixes for 4.12
@ 2017-05-10 16:47 Trond Myklebust
2017-05-10 20:06 ` Linus Torvalds
2017-05-11 7:53 ` Nikolay Borisov
0 siblings, 2 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-05-10 16:47 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel, linux-nfs
Hi Linus,
The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
are available in the git repository at:
git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-1
for you to fetch changes up to 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled (2017-05-09 16:02:57 -0400)
----------------------------------------------------------------
NFS client updates for Linux 4.12
Highlights include:
Stable bugfixes:
- Fix use after free in write error path
- Use GFP_NOIO for two allocations in writeback
- Fix a hang in OPEN related to server reboot
- Check the result of nfs4_pnfs_ds_connect
- Fix an rcu lock leak
Features:
- Removal of the unmaintained and unused OSD pNFS layout
- Cleanup and removal of lots of unnecessary dprintk()s
- Cleanup and removal of some memory failure paths now that
GFP_NOFS is guaranteed to never fail.
- Remove the v3-only data server limitation on pNFS/flexfiles
Bugfixes:
- RPC/RDMA connection handling bugfixes
- Copy offload: fixes to ensure the copied data is COMMITed to disk.
- Readdir: switch back to using the ->iterate VFS interface
- File locking fixes from Ben Coddington
- Various use-after-free and deadlock issues in pNFS
- Write path bugfixes
----------------------------------------------------------------
Anna Schumaker (29):
NFS: Clean up do_callback_layoutrecall()
NFS: Clean up nfs4_callback_layoutrecall()
NFS: Remove extra dprintk()s from callback_proc.c
NFS: Clean up decode_getattr_args()
NFS: Clean up decode_recall_args()
NFS: Clean up decode_layoutrecall_args()
NFS: Clean up decode_cb_sequence_args()
NFS: Clean up decode_notify_lock_args()
NFS: Clean up encode_cb_sequence_res()
NFS: Remove extra dprintk()s from callback_xdr.c
NFS: Clean up nfs_init_client()
NFS: Clean up extra dprintk()s in client.c
NFS: Remove nfs_direct_readpage_release()
NFS: Clean up nfs_direct_commit_complete()
NFS: Remove extra dprintk()s from namespace.c
NFS: Clean up nfs42_layoutstat_done()
NFS: Clean up nfs4_match_clientids()
NFS: Clean up nfs4_check_serverowner_minor_id()
NFS: Create a common nfs4_match_client() function
NFS: Clean up nfs4_check_serverowner_major_id()
NFS: Clean up nfs4_check_server_scope()
NFS: Clean up nfs4_set_client()
NFS: Clean up nfs4_init_server()
NFS: Remove extra dprintk()s from nfs4client.c
NFS: Clean up nfs4_get_rootfh()
NFS: Remove extra dprintk()s from nfs4namespace.c
NFS: Clean up nfs4_proc_bind_one_conn_to_session()
NFS: Clean up _nfs4_proc_exchange_id()
NFS: Clean up nfs4_proc_get_lease_time()
Artem Savkov (1):
nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()
Benjamin Coddington (11):
NFS: switch back to to ->iterate()
NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
NFS: Use GFP_NOIO for two allocations in writeback
NFS: move nfs_pgarray_set() to open code
NFS: move rw_mode to nfs_pageio_header
NFS4: remove a redundant lock range check
NFS: Move the flock open mode check into nfs_flock()
locks: Set FL_CLOSE when removing flock locks on close()
NFS: Add an iocounter wait function for async RPC tasks
lockd: Introduce nlmclnt_operations
NFS: Always wait for I/O completion before unlock
Christoph Hellwig (1):
nfs: remove the objlayout driver
Chuck Lever (13):
xprtrdma: Cancel refresh worker during buffer shutdown
sunrpc: Export xprt_force_disconnect()
xprtrdma: Detect unreachable NFS/RDMA servers more reliably
xprtrdma: Refactor rpcrdma_ia_open()
xprtrdma: Use same device when mapping or syncing DMA buffers
xprtrdma: Support unplugging an HCA from under an NFS mount
xprtrdma: Refactor rpcrdma_ep_connect
xprtrdma: Restore transport after device removal
xprtrdma: Revert commit d0f36c46deea
xprtrdma: Annotate receive workqueue
xprtrdma: Squelch ENOBUFS warnings
sunrpc: Fix xdr_init_decode_pages() documenting comment
xprtrdma: Remove rpcrdma_buffer::rb_pool
Dave Wysochanski (1):
Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew
Fabian Frederick (1):
nfs: use kmap/kunmap directly
Fred Isaman (2):
NFS: Fix use after free in write error path
pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits
Hou Tao (1):
NFS: always treat the invocation of nfs_getattr as cache hit when noac is on
NeilBrown (2):
NFS: fix usage of mempools.
sunrpc: don't check for failure from mempool_alloc()
Olga Kornievskaia (2):
NFS4.1 handle interrupted slot reuse from ERR_DELAY
NFS append COMMIT after synchronous COPY
Pan Bian (1):
NFSv4: check return value of xdr_inline_decode
Tigran Mkrtchyan (1):
nfs: flexfilelayout: remove v3-only data server limitation
Trond Myklebust (24):
NFSv4: Fix a hang in OPEN related to server reboot
pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect
pNFS: Remove unused layout driver callbacks
pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layout
pNFS: unexport nfs4_pnfs_v3_ds_connect_unload
pNFS: Ensure we check layout segment validity in the pg_init() callback
pNFS: Fix use after free issues in pnfs_do_read()
NFS: Don't write back further requests if there is a pending write error
NFSv3: nfs3_nlm_alloc_call should be declared static
Merge tag 'nfs-rdma-4.12-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()
NFSv4: Don't special case "launder"
pNFS: Ensure we check layout validity before marking it for return
pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
pNFS: Ensure we commit the layout if it has been invalidated
pNFS: Don't clear the layout return info if there are segments to return
pNFS: Fix a deadlock when coalescing writes and returning the layout
pNFS: Fix a typo in pnfs_generic_alloc_ds_commits
NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
NFSv4: Fix an rcu lock leak
NFSv4: Fix exclusive create attributes encoding
NFSv4.1: Work around a Linux server bug...
pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled
Documentation/admin-guide/kernel-parameters.txt | 6 -
Documentation/filesystems/nfs/pnfs.txt | 37 --
fs/fuse/file.c | 2 +-
fs/lockd/clntlock.c | 1 +
fs/lockd/clntproc.c | 26 +-
fs/locks.c | 2 +-
fs/nfs/Kconfig | 5 -
fs/nfs/Makefile | 1 -
fs/nfs/callback_proc.c | 47 +-
fs/nfs/callback_xdr.c | 109 +---
fs/nfs/client.c | 67 +--
fs/nfs/dir.c | 104 +---
fs/nfs/direct.c | 21 +-
fs/nfs/file.c | 30 +-
fs/nfs/filelayout/filelayout.c | 8 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 24 +-
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 10 +-
fs/nfs/inode.c | 5 +-
fs/nfs/internal.h | 5 +-
fs/nfs/namespace.c | 34 +-
fs/nfs/nfs3proc.c | 54 +-
fs/nfs/nfs42proc.c | 24 +-
fs/nfs/nfs42xdr.c | 22 +-
fs/nfs/nfs4client.c | 283 +++-------
fs/nfs/nfs4getroot.c | 3 -
fs/nfs/nfs4namespace.c | 7 +-
fs/nfs/nfs4proc.c | 99 ++--
fs/nfs/nfs4state.c | 10 +-
fs/nfs/nfs4xdr.c | 94 ++--
fs/nfs/objlayout/Kbuild | 5 -
fs/nfs/objlayout/objio_osd.c | 675 ----------------------
fs/nfs/objlayout/objlayout.c | 706 ------------------------
fs/nfs/objlayout/objlayout.h | 183 ------
fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 415 --------------
fs/nfs/pagelist.c | 77 ++-
fs/nfs/pnfs.c | 62 ++-
fs/nfs/pnfs.h | 6 +-
fs/nfs/pnfs_nfs.c | 24 +-
fs/nfs/proc.c | 2 +-
fs/nfs/read.c | 9 +-
fs/nfs/write.c | 121 ++--
include/linux/fs.h | 2 +
include/linux/lockd/bind.h | 24 +-
include/linux/lockd/lockd.h | 2 +
include/linux/nfs_fs.h | 17 +-
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_page.h | 5 +-
include/linux/nfs_xdr.h | 3 +
net/sunrpc/clnt.c | 8 -
net/sunrpc/sched.c | 5 -
net/sunrpc/xdr.c | 2 +-
net/sunrpc/xprt.c | 1 +
net/sunrpc/xprtrdma/rpc_rdma.c | 12 +-
net/sunrpc/xprtrdma/transport.c | 57 +-
net/sunrpc/xprtrdma/verbs.c | 323 +++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 22 +-
56 files changed, 949 insertions(+), 2960 deletions(-)
delete mode 100644 fs/nfs/objlayout/Kbuild
delete mode 100644 fs/nfs/objlayout/objio_osd.c
delete mode 100644 fs/nfs/objlayout/objlayout.c
delete mode 100644 fs/nfs/objlayout/objlayout.h
delete mode 100644 fs/nfs/objlayout/pnfs_osd_xdr_cli.c
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
@ 2017-05-10 20:06 ` Linus Torvalds
2017-05-11 7:53 ` Nikolay Borisov
1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2017-05-10 20:06 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel, linux-nfs
On Wed, May 10, 2017 at 9:47 AM, Trond Myklebust
<trondmy@primarydata.com> wrote:
>
> Features:
> - Removal of the unmaintained and unused OSD pNFS layout
> - Cleanup and removal of lots of unnecessary dprintk()s
> - Cleanup and removal of some memory failure paths now that
> ..
> 56 files changed, 949 insertions(+), 2960 deletions(-)
Lovely. Thanks,
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
2017-05-10 20:06 ` Linus Torvalds
@ 2017-05-11 7:53 ` Nikolay Borisov
2017-05-11 7:59 ` Michal Hocko
2017-05-11 13:41 ` Trond Myklebust
1 sibling, 2 replies; 14+ messages in thread
From: Nikolay Borisov @ 2017-05-11 7:53 UTC (permalink / raw)
To: Trond Myklebust, torvalds; +Cc: linux-kernel, linux-nfs, Michal Hocko
On 10.05.2017 19:47, Trond Myklebust wrote:
> Hi Linus,
>
> The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
>
> Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
>
> are available in the git repository at:
>
> git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-1
>
> for you to fetch changes up to 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
>
> pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled (2017-05-09 16:02:57 -0400)
>
> ----------------------------------------------------------------
> NFS client updates for Linux 4.12
>
> Highlights include:
>
> Stable bugfixes:
> - Fix use after free in write error path
> - Use GFP_NOIO for two allocations in writeback
> - Fix a hang in OPEN related to server reboot
> - Check the result of nfs4_pnfs_ds_connect
> - Fix an rcu lock leak
>
> Features:
> - Removal of the unmaintained and unused OSD pNFS layout
> - Cleanup and removal of lots of unnecessary dprintk()s
> - Cleanup and removal of some memory failure paths now that
> GFP_NOFS is guaranteed to never fail.
What guarantees that? Since if this is the case then this can result in
a lot of opportunities for cleanup across the whole kernel tree. After
discussing with mhocko (cc'ed) it seems that in practice everything
below COSTLY_ORDER which are not GFP_NORETRY will never fail. But this
semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
this will stay like that in the future?
[omitted for brevity]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 7:53 ` Nikolay Borisov
@ 2017-05-11 7:59 ` Michal Hocko
2017-05-11 12:16 ` Trond Myklebust
2017-05-11 13:41 ` Trond Myklebust
1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 7:59 UTC (permalink / raw)
To: Nikolay Borisov; +Cc: Trond Myklebust, torvalds, linux-kernel, linux-nfs
On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
>
>
> On 10.05.2017 19:47, Trond Myklebust wrote:
[...]
> > - Cleanup and removal of some memory failure paths now that
> > GFP_NOFS is guaranteed to never fail.
>
> What guarantees that? Since if this is the case then this can result in
> a lot of opportunities for cleanup across the whole kernel tree. After
> discussing with mhocko (cc'ed) it seems that in practice everything
> below COSTLY_ORDER which are not GFP_NORETRY will never fail. But this
> semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> this will stay like that in the future?
In practice it is hard to change the semantic of small allocations never
fail _practically_. But this is absolutely not guaranteed! They can fail
e.g. when the allocation context is the oom victim. Removing error paths
for allocation failures is just wrong.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 7:59 ` Michal Hocko
@ 2017-05-11 12:16 ` Trond Myklebust
2017-05-11 12:26 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 12:16 UTC (permalink / raw)
To: mhocko, n.borisov.lkml; +Cc: torvalds, linux-kernel, linux-nfs
On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> >
> >
> > On 10.05.2017 19:47, Trond Myklebust wrote:
>
> [...]
> > > - Cleanup and removal of some memory failure paths now that
> > > GFP_NOFS is guaranteed to never fail.
> >
> > What guarantees that? Since if this is the case then this can
> > result in
> > a lot of opportunities for cleanup across the whole kernel tree.
> > After
> > discussing with mhocko (cc'ed) it seems that in practice everything
> > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > this
> > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > that
> > this will stay like that in the future?
>
> In practice it is hard to change the semantic of small allocations
> never
> fail _practically_. But this is absolutely not guaranteed! They can
> fail
> e.g. when the allocation context is the oom victim. Removing error
> paths
> for allocation failures is just wrong.
OK, this makes no fucking sense at all.
Either allocations can fail or they can't.
1) If they can't fail, then we don't need the checks.
2) If they can fail, then we do need them, and this hand wringing in
the MM community about GFP_* semantics and how we need to prevent
failure is fucking pointless.
So which is it? (1) or (2)?
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 12:16 ` Trond Myklebust
@ 2017-05-11 12:26 ` Michal Hocko
2017-05-11 12:45 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 12:26 UTC (permalink / raw)
To: Trond Myklebust; +Cc: n.borisov.lkml, torvalds, linux-kernel, linux-nfs
On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > >
> > >
> > > On 10.05.2017 19:47, Trond Myklebust wrote:
> >
> > [...]
> > > > - Cleanup and removal of some memory failure paths now that
> > > > GFP_NOFS is guaranteed to never fail.
> > >
> > > What guarantees that? Since if this is the case then this can
> > > result in
> > > a lot of opportunities for cleanup across the whole kernel tree.
> > > After
> > > discussing with mhocko (cc'ed) it seems that in practice everything
> > > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > > this
> > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > that
> > > this will stay like that in the future?
> >
> > In practice it is hard to change the semantic of small allocations
> > never
> > fail _practically_. But this is absolutely not guaranteed! They can
> > fail
> > e.g. when the allocation context is the oom victim. Removing error
> > paths
> > for allocation failures is just wrong.
>
> OK, this makes no fucking sense at all.
>
> Either allocations can fail or they can't.
> 1) If they can't fail, then we don't need the checks.
> 2) If they can fail, then we do need them, and this hand wringing in
> the MM community about GFP_* semantics and how we need to prevent
> failure is fucking pointless.
everything which is not __GFP_NOFAIL might fail. We try hard not to fail
small allocations requests as much as we can in general but you _have_ to
check for failures. There is simply no way to guarantee "never fail"
semantic for all allocation requests. This has been like that basically
since years. And even this try-to-be-nofailing for small allocations has
been PITA for some corner cases.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 12:26 ` Michal Hocko
@ 2017-05-11 12:45 ` Trond Myklebust
2017-05-11 12:56 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 12:45 UTC (permalink / raw)
To: mhocko; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml
On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > >
> > > >
> > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > >
> > > [...]
> > > > > - Cleanup and removal of some memory failure paths now that
> > > > > GFP_NOFS is guaranteed to never fail.
> > > >
> > > > What guarantees that? Since if this is the case then this can
> > > > result in
> > > > a lot of opportunities for cleanup across the whole kernel
> > > > tree.
> > > > After
> > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > everything
> > > > below COSTLY_ORDER which are not GFP_NORETRY will never fail.
> > > > But
> > > > this
> > > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > > that
> > > > this will stay like that in the future?
> > >
> > > In practice it is hard to change the semantic of small
> > > allocations
> > > never
> > > fail _practically_. But this is absolutely not guaranteed! They
> > > can
> > > fail
> > > e.g. when the allocation context is the oom victim. Removing
> > > error
> > > paths
> > > for allocation failures is just wrong.
> >
> > OK, this makes no fucking sense at all.
> >
> > Either allocations can fail or they can't.
> > 1) If they can't fail, then we don't need the checks.
> > 2) If they can fail, then we do need them, and this hand wringing
> > in
> > the MM community about GFP_* semantics and how we need to prevent
> > failure is fucking pointless.
>
> everything which is not __GFP_NOFAIL might fail. We try hard not to
> fail
> small allocations requests as much as we can in general but you
> _have_ to
> check for failures. There is simply no way to guarantee "never fail"
> semantic for all allocation requests. This has been like that
> basically
> since years. And even this try-to-be-nofailing for small allocations
> has
> been PITA for some corner cases.
I'll take that as a vote for (2), then.
I know that failures could occur in the past. That's why those code
paths were there. The problem is that the MM community has been making
lots of noise on mailing lists, conferences and LWN articles about how
we must not fail small allocations because the MM community believes
that nobody expects it. This is confusing everyone... It confused Neil
Brown, who contributed these patches, and it confused me and all the
other reviewers of these patches on the linux-nfs mailing list.
So if indeed (2) is correct, then please can we have a clear statement
_when discussing improvements to memory allocation semantics_ that
GFP_* still can fail, still will fail, and that callers should assume
it will fail and should test their code paths assuming the failure
case.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 12:45 ` Trond Myklebust
@ 2017-05-11 12:56 ` Michal Hocko
2017-05-11 13:10 ` Trond Myklebust
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 12:56 UTC (permalink / raw)
To: Trond Myklebust; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml
On Thu 11-05-17 12:45:00, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> > On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > > >
> > > > >
> > > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > >
> > > > [...]
> > > > > > - Cleanup and removal of some memory failure paths now that
> > > > > > GFP_NOFS is guaranteed to never fail.
> > > > >
> > > > > What guarantees that? Since if this is the case then this can
> > > > > result in
> > > > > a lot of opportunities for cleanup across the whole kernel
> > > > > tree.
> > > > > After
> > > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > > everything
> > > > > below COSTLY_ORDER which are not GFP_NORETRY will never fail.
> > > > > But
> > > > > this
> > > > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > > > that
> > > > > this will stay like that in the future?
> > > >
> > > > In practice it is hard to change the semantic of small
> > > > allocations
> > > > never
> > > > fail _practically_. But this is absolutely not guaranteed! They
> > > > can
> > > > fail
> > > > e.g. when the allocation context is the oom victim. Removing
> > > > error
> > > > paths
> > > > for allocation failures is just wrong.
> > >
> > > OK, this makes no fucking sense at all.
> > >
> > > Either allocations can fail or they can't.
> > > 1) If they can't fail, then we don't need the checks.
> > > 2) If they can fail, then we do need them, and this hand wringing
> > > in
> > > the MM community about GFP_* semantics and how we need to prevent
> > > failure is fucking pointless.
> >
> > everything which is not __GFP_NOFAIL might fail. We try hard not to
> > fail
> > small allocations requests as much as we can in general but you
> > _have_ to
> > check for failures. There is simply no way to guarantee "never fail"
> > semantic for all allocation requests. This has been like that
> > basically
> > since years. And even this try-to-be-nofailing for small allocations
> > has
> > been PITA for some corner cases.
>
> I'll take that as a vote for (2), then.
>
> I know that failures could occur in the past. That's why those code
> paths were there. The problem is that the MM community has been making
> lots of noise on mailing lists, conferences and LWN articles about how
> we must not fail small allocations because the MM community believes
> that nobody expects it. This is confusing everyone...
It was exactly other way around. We would like to _get_rid_of_ this do
not fail behavior because it is causing a major headaches in out of
memory corner cases. Just take GFP_NOFS as an example. It is a weak
reclaim context because we cannot reclaim fs metadata and that might be
a lot of memory so we cannot trigger the OOM killer and have to rely on
a different allocation context or kswapd to make a progress on our
behalf. We would really like to fail those requests instead. I've tried
that in the past but it was deemed to dangerous because _all_ kernel
paths would have to be checked for a sane failure behavior. So we are
keeping status quo instead.
> It confused Neil Brown, who contributed these patches, and it confused
> me and all the other reviewers of these patches on the linux-nfs
> mailing list.
>
> So if indeed (2) is correct, then please can we have a clear statement
> _when discussing improvements to memory allocation semantics_ that
> GFP_* still can fail, still will fail, and that callers should assume
> it will fail and should test their code paths assuming the failure
> case.
I do not see any explicit documentation which would encourage users to
not check for the allocation failure. Only __GFP_NOFAIL is documented it
_must_ retry for ever. Of course I am open for any documentation
improvements.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 12:56 ` Michal Hocko
@ 2017-05-11 13:10 ` Trond Myklebust
2017-05-11 13:27 ` Michal Hocko
2017-05-16 15:15 ` Jonathan Corbet
0 siblings, 2 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 13:10 UTC (permalink / raw)
To: mhocko; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml
On Thu, 2017-05-11 at 14:56 +0200, Michal Hocko wrote:
> On Thu 11-05-17 12:45:00, Trond Myklebust wrote:
> > On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > > > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > > > >
> > > > > >
> > > > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > > >
> > > > > [...]
> > > > > > > - Cleanup and removal of some memory failure paths now
> > > > > > > that
> > > > > > > GFP_NOFS is guaranteed to never fail.
> > > > > >
> > > > > > What guarantees that? Since if this is the case then this
> > > > > > can
> > > > > > result in
> > > > > > a lot of opportunities for cleanup across the whole kernel
> > > > > > tree.
> > > > > > After
> > > > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > > > everything
> > > > > > below COSTLY_ORDER which are not GFP_NORETRY will never
> > > > > > fail.
> > > > > > But
> > > > > > this
> > > > > > semantic is not the same as GFP_NOFAIL. E.g. nothing
> > > > > > guarantees
> > > > > > that
> > > > > > this will stay like that in the future?
> > > > >
> > > > > In practice it is hard to change the semantic of small
> > > > > allocations
> > > > > never
> > > > > fail _practically_. But this is absolutely not guaranteed!
> > > > > They
> > > > > can
> > > > > fail
> > > > > e.g. when the allocation context is the oom victim. Removing
> > > > > error
> > > > > paths
> > > > > for allocation failures is just wrong.
> > > >
> > > > OK, this makes no fucking sense at all.
> > > >
> > > > Either allocations can fail or they can't.
> > > > 1) If they can't fail, then we don't need the checks.
> > > > 2) If they can fail, then we do need them, and this hand
> > > > wringing
> > > > in
> > > > the MM community about GFP_* semantics and how we need to
> > > > prevent
> > > > failure is fucking pointless.
> > >
> > > everything which is not __GFP_NOFAIL might fail. We try hard not
> > > to
> > > fail
> > > small allocations requests as much as we can in general but you
> > > _have_ to
> > > check for failures. There is simply no way to guarantee "never
> > > fail"
> > > semantic for all allocation requests. This has been like that
> > > basically
> > > since years. And even this try-to-be-nofailing for small
> > > allocations
> > > has
> > > been PITA for some corner cases.
> >
> > I'll take that as a vote for (2), then.
> >
> > I know that failures could occur in the past. That's why those code
> > paths were there. The problem is that the MM community has been
> > making
> > lots of noise on mailing lists, conferences and LWN articles about
> > how
> > we must not fail small allocations because the MM community
> > believes
> > that nobody expects it. This is confusing everyone...
>
> It was exactly other way around. We would like to _get_rid_of_ this
> do
> not fail behavior because it is causing a major headaches in out of
> memory corner cases. Just take GFP_NOFS as an example. It is a weak
> reclaim context because we cannot reclaim fs metadata and that might
> be
> a lot of memory so we cannot trigger the OOM killer and have to rely
> on
> a different allocation context or kswapd to make a progress on our
> behalf. We would really like to fail those requests instead. I've
> tried
> that in the past but it was deemed to dangerous because _all_ kernel
> paths would have to be checked for a sane failure behavior. So we are
> keeping status quo instead.
If we suspect the existence of a load of potential time bombs in the
kernel due to missing checks, then the status quo is not good enough.
We should be working on tools to identify these code paths.
Quite frankly, I'd love to see a fuzzer-like tool that can randomly
fail allocations. I can easily make one for the NFS code, but if there
is a general problem identifying buggy code, then perhaps it should be
solved at the MM layer itself.
> > It confused Neil Brown, who contributed these patches, and it
> > confused
> > me and all the other reviewers of these patches on the linux-nfs
> > mailing list.
> >
> > So if indeed (2) is correct, then please can we have a clear
> > statement
> > _when discussing improvements to memory allocation semantics_ that
> > GFP_* still can fail, still will fail, and that callers should
> > assume
> > it will fail and should test their code paths assuming the failure
> > case.
>
> I do not see any explicit documentation which would encourage users
> to
> not check for the allocation failure. Only __GFP_NOFAIL is documented
> it
> _must_ retry for ever. Of course I am open for any documentation
> improvements.
As I said, the problem has been the discussion, and how it focusses on
"must not fail".
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 13:10 ` Trond Myklebust
@ 2017-05-11 13:27 ` Michal Hocko
2017-05-16 15:15 ` Jonathan Corbet
1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 13:27 UTC (permalink / raw)
To: Trond Myklebust; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml
On Thu 11-05-17 13:10:03, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 14:56 +0200, Michal Hocko wrote:
[...]
> > I do not see any explicit documentation which would encourage users
> > to not check for the allocation failure. Only __GFP_NOFAIL is
> > documented it _must_ retry for ever. Of course I am open for any
> > documentation improvements.
>
> As I said, the problem has been the discussion, and how it focusses on
> "must not fail".
I would strongly suggest cc linux-mm ML when doing this kind of changes.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 7:53 ` Nikolay Borisov
2017-05-11 7:59 ` Michal Hocko
@ 2017-05-11 13:41 ` Trond Myklebust
2017-05-11 13:54 ` Michal Hocko
1 sibling, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 13:41 UTC (permalink / raw)
To: torvalds, n.borisov.lkml; +Cc: linux-kernel, linux-nfs, mhocko
On Thu, 2017-05-11 at 10:53 +0300, Nikolay Borisov wrote:
>
> On 10.05.2017 19:47, Trond Myklebust wrote:
> > Hi Linus,
> >
> > The following changes since commit
> > 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
> >
> > Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
> >
> > are available in the git repository at:
> >
> > git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-
> > for-4.12-1
> >
> > for you to fetch changes up to
> > 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
> >
> > pNFS/flexfiles: Always attempt to call layoutstats when flexfiles
> > is enabled (2017-05-09 16:02:57 -0400)
> >
> > ----------------------------------------------------------------
> > NFS client updates for Linux 4.12
> >
> > Highlights include:
> >
> > Stable bugfixes:
> > - Fix use after free in write error path
> > - Use GFP_NOIO for two allocations in writeback
> > - Fix a hang in OPEN related to server reboot
> > - Check the result of nfs4_pnfs_ds_connect
> > - Fix an rcu lock leak
> >
> > Features:
> > - Removal of the unmaintained and unused OSD pNFS layout
> > - Cleanup and removal of lots of unnecessary dprintk()s
> > - Cleanup and removal of some memory failure paths now that
> > GFP_NOFS is guaranteed to never fail.
>
> What guarantees that? Since if this is the case then this can result
> in
> a lot of opportunities for cleanup across the whole kernel tree.
> After
> discussing with mhocko (cc'ed) it seems that in practice everything
> below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> this
> semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> this will stay like that in the future?
>
Actually, going back to the code with coffee: it's the fact we have
mempools, with direct reclaim that guarantee this.
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 13:41 ` Trond Myklebust
@ 2017-05-11 13:54 ` Michal Hocko
0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 13:54 UTC (permalink / raw)
To: Trond Myklebust; +Cc: torvalds, n.borisov.lkml, linux-kernel, linux-nfs
On Thu 11-05-17 13:41:12, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 10:53 +0300, Nikolay Borisov wrote:
> >
> > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > Hi Linus,
> > >
> > > The following changes since commit
> > > 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
> > >
> > > Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
> > >
> > > are available in the git repository at:
> > >
> > > git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-
> > > for-4.12-1
> > >
> > > for you to fetch changes up to
> > > 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
> > >
> > > pNFS/flexfiles: Always attempt to call layoutstats when flexfiles
> > > is enabled (2017-05-09 16:02:57 -0400)
> > >
> > > ----------------------------------------------------------------
> > > NFS client updates for Linux 4.12
> > >
> > > Highlights include:
> > >
> > > Stable bugfixes:
> > > - Fix use after free in write error path
> > > - Use GFP_NOIO for two allocations in writeback
> > > - Fix a hang in OPEN related to server reboot
> > > - Check the result of nfs4_pnfs_ds_connect
> > > - Fix an rcu lock leak
> > >
> > > Features:
> > > - Removal of the unmaintained and unused OSD pNFS layout
> > > - Cleanup and removal of lots of unnecessary dprintk()s
> > > - Cleanup and removal of some memory failure paths now that
> > > GFP_NOFS is guaranteed to never fail.
> >
> > What guarantees that? Since if this is the case then this can result
> > in
> > a lot of opportunities for cleanup across the whole kernel tree.
> > After
> > discussing with mhocko (cc'ed) it seems that in practice everything
> > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > this
> > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> > this will stay like that in the future?
> >
>
> Actually, going back to the code with coffee: it's the fact we have
> mempools, with direct reclaim that guarantee this.
Mempools are a different story, of course. They do the their own loop on
top of the underlying allocator.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [GIT PULL] Please pull NFS client fixes for 4.12
2017-05-11 13:10 ` Trond Myklebust
2017-05-11 13:27 ` Michal Hocko
@ 2017-05-16 15:15 ` Jonathan Corbet
1 sibling, 0 replies; 14+ messages in thread
From: Jonathan Corbet @ 2017-05-16 15:15 UTC (permalink / raw)
To: Trond Myklebust; +Cc: mhocko, torvalds, linux-kernel, linux-nfs, n.borisov.lkml
On Thu, 11 May 2017 13:10:03 +0000
Trond Myklebust <trondmy@primarydata.com> wrote:
> If we suspect the existence of a load of potential time bombs in the
> kernel due to missing checks, then the status quo is not good enough.
> We should be working on tools to identify these code paths.
>
> Quite frankly, I'd love to see a fuzzer-like tool that can randomly
> fail allocations.
The fault injection framework at least used to work quite nicely for this;
I used it to test out the failure paths in the OLPC drivers years ago.
Documentation/fault-injection/fault-injection.txt.
jon
^ permalink raw reply [flat|nested] 14+ messages in thread
* [GIT PULL] Please pull NFS client fixes for 4.12
@ 2017-06-28 14:19 Trond Myklebust
0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-06-28 14:19 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel, linux-nfs
Hi Linus,
The following changes since commit 3c2993b8c6143d8a5793746a54eba8f86f95240f:
Linux 4.12-rc4 (2017-06-04 16:47:43 -0700)
are available in the git repository at:
git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-3
for you to fetch changes up to 2e31b4cb895ae78db31dffb860cd255d86c6561c:
NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() (2017-06-27 22:26:23 -0400)
----------------------------------------------------------------
NFS client bugfixes for Linux 4.12
Bugfixes include:
- Stable fix for exclusive create if the server supports the umask attribute
- Trunking detection should handle ERESTARTSYS/EINTR
- Stable fix for a race in the LAYOUTGET function
- Stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left behind"
- nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
----------------------------------------------------------------
Benjamin Coddington (2):
NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask
Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"
Trond Myklebust (3):
NFS: Trunking detection should handle ERESTARTSYS/EINTR
NFSv4.1: Fix a race in nfs4_proc_layoutget
NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()
fs/nfs/callback_xdr.c | 1 -
fs/nfs/dir.c | 51 ++++++++++++++++++++++++---------------------------
fs/nfs/nfs4proc.c | 5 +++--
fs/nfs/nfs4state.c | 2 ++
4 files changed, 29 insertions(+), 30 deletions(-)
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-28 14:19 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
2017-05-10 20:06 ` Linus Torvalds
2017-05-11 7:53 ` Nikolay Borisov
2017-05-11 7:59 ` Michal Hocko
2017-05-11 12:16 ` Trond Myklebust
2017-05-11 12:26 ` Michal Hocko
2017-05-11 12:45 ` Trond Myklebust
2017-05-11 12:56 ` Michal Hocko
2017-05-11 13:10 ` Trond Myklebust
2017-05-11 13:27 ` Michal Hocko
2017-05-16 15:15 ` Jonathan Corbet
2017-05-11 13:41 ` Trond Myklebust
2017-05-11 13:54 ` Michal Hocko
2017-06-28 14:19 Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).