linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] Please pull NFS client fixes for 4.12
@ 2017-05-10 16:47 Trond Myklebust
  2017-05-10 20:06 ` Linus Torvalds
  2017-05-11  7:53 ` Nikolay Borisov
  0 siblings, 2 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-05-10 16:47 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, linux-nfs

Hi Linus,

The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:

  Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-1

for you to fetch changes up to 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:

  pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled (2017-05-09 16:02:57 -0400)

----------------------------------------------------------------
NFS client updates for Linux 4.12

Highlights include:

Stable bugfixes:
- Fix use after free in write error path
- Use GFP_NOIO for two allocations in writeback
- Fix a hang in OPEN related to server reboot
- Check the result of nfs4_pnfs_ds_connect
- Fix an rcu lock leak

Features:
- Removal of the unmaintained and unused OSD pNFS layout
- Cleanup and removal of lots of unnecessary dprintk()s
- Cleanup and removal of some memory failure paths now that
  GFP_NOFS is guaranteed to never fail.
- Remove the v3-only data server limitation on pNFS/flexfiles

Bugfixes:
- RPC/RDMA connection handling bugfixes
- Copy offload: fixes to ensure the copied data is COMMITed to disk.
- Readdir: switch back to using the ->iterate VFS interface
- File locking fixes from Ben Coddington
- Various use-after-free and deadlock issues in pNFS
- Write path bugfixes

----------------------------------------------------------------
Anna Schumaker (29):
      NFS: Clean up do_callback_layoutrecall()
      NFS: Clean up nfs4_callback_layoutrecall()
      NFS: Remove extra dprintk()s from callback_proc.c
      NFS: Clean up decode_getattr_args()
      NFS: Clean up decode_recall_args()
      NFS: Clean up decode_layoutrecall_args()
      NFS: Clean up decode_cb_sequence_args()
      NFS: Clean up decode_notify_lock_args()
      NFS: Clean up encode_cb_sequence_res()
      NFS: Remove extra dprintk()s from callback_xdr.c
      NFS: Clean up nfs_init_client()
      NFS: Clean up extra dprintk()s in client.c
      NFS: Remove nfs_direct_readpage_release()
      NFS: Clean up nfs_direct_commit_complete()
      NFS: Remove extra dprintk()s from namespace.c
      NFS: Clean up nfs42_layoutstat_done()
      NFS: Clean up nfs4_match_clientids()
      NFS: Clean up nfs4_check_serverowner_minor_id()
      NFS: Create a common nfs4_match_client() function
      NFS: Clean up nfs4_check_serverowner_major_id()
      NFS: Clean up nfs4_check_server_scope()
      NFS: Clean up nfs4_set_client()
      NFS: Clean up nfs4_init_server()
      NFS: Remove extra dprintk()s from nfs4client.c
      NFS: Clean up nfs4_get_rootfh()
      NFS: Remove extra dprintk()s from nfs4namespace.c
      NFS: Clean up nfs4_proc_bind_one_conn_to_session()
      NFS: Clean up _nfs4_proc_exchange_id()
      NFS: Clean up nfs4_proc_get_lease_time()

Artem Savkov (1):
      nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()

Benjamin Coddington (11):
      NFS: switch back to to ->iterate()
      NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
      NFS: Use GFP_NOIO for two allocations in writeback
      NFS: move nfs_pgarray_set() to open code
      NFS: move rw_mode to nfs_pageio_header
      NFS4: remove a redundant lock range check
      NFS: Move the flock open mode check into nfs_flock()
      locks: Set FL_CLOSE when removing flock locks on close()
      NFS: Add an iocounter wait function for async RPC tasks
      lockd: Introduce nlmclnt_operations
      NFS: Always wait for I/O completion before unlock

Christoph Hellwig (1):
      nfs: remove the objlayout driver

Chuck Lever (13):
      xprtrdma: Cancel refresh worker during buffer shutdown
      sunrpc: Export xprt_force_disconnect()
      xprtrdma: Detect unreachable NFS/RDMA servers more reliably
      xprtrdma: Refactor rpcrdma_ia_open()
      xprtrdma: Use same device when mapping or syncing DMA buffers
      xprtrdma: Support unplugging an HCA from under an NFS mount
      xprtrdma: Refactor rpcrdma_ep_connect
      xprtrdma: Restore transport after device removal
      xprtrdma: Revert commit d0f36c46deea
      xprtrdma: Annotate receive workqueue
      xprtrdma: Squelch ENOBUFS warnings
      sunrpc: Fix xdr_init_decode_pages() documenting comment
      xprtrdma: Remove rpcrdma_buffer::rb_pool

Dave Wysochanski (1):
      Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew

Fabian Frederick (1):
      nfs: use kmap/kunmap directly

Fred Isaman (2):
      NFS: Fix use after free in write error path
      pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits

Hou Tao (1):
      NFS: always treat the invocation of nfs_getattr as cache hit when noac is on

NeilBrown (2):
      NFS: fix usage of mempools.
      sunrpc: don't check for failure from mempool_alloc()

Olga Kornievskaia (2):
      NFS4.1 handle interrupted slot reuse from ERR_DELAY
      NFS append COMMIT after synchronous COPY

Pan Bian (1):
      NFSv4: check return value of xdr_inline_decode

Tigran Mkrtchyan (1):
      nfs: flexfilelayout: remove v3-only data server limitation

Trond Myklebust (24):
      NFSv4: Fix a hang in OPEN related to server reboot
      pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect
      pNFS: Remove unused layout driver callbacks
      pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layout
      pNFS: unexport nfs4_pnfs_v3_ds_connect_unload
      pNFS: Ensure we check layout segment validity in the pg_init() callback
      pNFS: Fix use after free issues in pnfs_do_read()
      NFS: Don't write back further requests if there is a pending write error
      NFSv3: nfs3_nlm_alloc_call should be declared static
      Merge tag 'nfs-rdma-4.12-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma
      NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()
      NFSv4: Don't special case "launder"
      pNFS: Ensure we check layout validity before marking it for return
      pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
      pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
      pNFS: Ensure we commit the layout if it has been invalidated
      pNFS: Don't clear the layout return info if there are segments to return
      pNFS: Fix a deadlock when coalescing writes and returning the layout
      pNFS: Fix a typo in pnfs_generic_alloc_ds_commits
      NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
      NFSv4: Fix an rcu lock leak
      NFSv4: Fix exclusive create attributes encoding
      NFSv4.1: Work around a Linux server bug...
      pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled

 Documentation/admin-guide/kernel-parameters.txt |   6 -
 Documentation/filesystems/nfs/pnfs.txt          |  37 --
 fs/fuse/file.c                                  |   2 +-
 fs/lockd/clntlock.c                             |   1 +
 fs/lockd/clntproc.c                             |  26 +-
 fs/locks.c                                      |   2 +-
 fs/nfs/Kconfig                                  |   5 -
 fs/nfs/Makefile                                 |   1 -
 fs/nfs/callback_proc.c                          |  47 +-
 fs/nfs/callback_xdr.c                           | 109 +---
 fs/nfs/client.c                                 |  67 +--
 fs/nfs/dir.c                                    | 104 +---
 fs/nfs/direct.c                                 |  21 +-
 fs/nfs/file.c                                   |  30 +-
 fs/nfs/filelayout/filelayout.c                  |   8 +-
 fs/nfs/flexfilelayout/flexfilelayout.c          |  24 +-
 fs/nfs/flexfilelayout/flexfilelayoutdev.c       |  10 +-
 fs/nfs/inode.c                                  |   5 +-
 fs/nfs/internal.h                               |   5 +-
 fs/nfs/namespace.c                              |  34 +-
 fs/nfs/nfs3proc.c                               |  54 +-
 fs/nfs/nfs42proc.c                              |  24 +-
 fs/nfs/nfs42xdr.c                               |  22 +-
 fs/nfs/nfs4client.c                             | 283 +++-------
 fs/nfs/nfs4getroot.c                            |   3 -
 fs/nfs/nfs4namespace.c                          |   7 +-
 fs/nfs/nfs4proc.c                               |  99 ++--
 fs/nfs/nfs4state.c                              |  10 +-
 fs/nfs/nfs4xdr.c                                |  94 ++--
 fs/nfs/objlayout/Kbuild                         |   5 -
 fs/nfs/objlayout/objio_osd.c                    | 675 ----------------------
 fs/nfs/objlayout/objlayout.c                    | 706 ------------------------
 fs/nfs/objlayout/objlayout.h                    | 183 ------
 fs/nfs/objlayout/pnfs_osd_xdr_cli.c             | 415 --------------
 fs/nfs/pagelist.c                               |  77 ++-
 fs/nfs/pnfs.c                                   |  62 ++-
 fs/nfs/pnfs.h                                   |   6 +-
 fs/nfs/pnfs_nfs.c                               |  24 +-
 fs/nfs/proc.c                                   |   2 +-
 fs/nfs/read.c                                   |   9 +-
 fs/nfs/write.c                                  | 121 ++--
 include/linux/fs.h                              |   2 +
 include/linux/lockd/bind.h                      |  24 +-
 include/linux/lockd/lockd.h                     |   2 +
 include/linux/nfs_fs.h                          |  17 +-
 include/linux/nfs_fs_sb.h                       |   1 +
 include/linux/nfs_page.h                        |   5 +-
 include/linux/nfs_xdr.h                         |   3 +
 net/sunrpc/clnt.c                               |   8 -
 net/sunrpc/sched.c                              |   5 -
 net/sunrpc/xdr.c                                |   2 +-
 net/sunrpc/xprt.c                               |   1 +
 net/sunrpc/xprtrdma/rpc_rdma.c                  |  12 +-
 net/sunrpc/xprtrdma/transport.c                 |  57 +-
 net/sunrpc/xprtrdma/verbs.c                     | 323 +++++++----
 net/sunrpc/xprtrdma/xprt_rdma.h                 |  22 +-
 56 files changed, 949 insertions(+), 2960 deletions(-)
 delete mode 100644 fs/nfs/objlayout/Kbuild
 delete mode 100644 fs/nfs/objlayout/objio_osd.c
 delete mode 100644 fs/nfs/objlayout/objlayout.c
 delete mode 100644 fs/nfs/objlayout/objlayout.h
 delete mode 100644 fs/nfs/objlayout/pnfs_osd_xdr_cli.c
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
@ 2017-05-10 20:06 ` Linus Torvalds
  2017-05-11  7:53 ` Nikolay Borisov
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2017-05-10 20:06 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, linux-nfs

On Wed, May 10, 2017 at 9:47 AM, Trond Myklebust
<trondmy@primarydata.com> wrote:
>
> Features:
> - Removal of the unmaintained and unused OSD pNFS layout
> - Cleanup and removal of lots of unnecessary dprintk()s
> - Cleanup and removal of some memory failure paths now that
> ..
>  56 files changed, 949 insertions(+), 2960 deletions(-)

Lovely. Thanks,

                    Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
  2017-05-10 20:06 ` Linus Torvalds
@ 2017-05-11  7:53 ` Nikolay Borisov
  2017-05-11  7:59   ` Michal Hocko
  2017-05-11 13:41   ` Trond Myklebust
  1 sibling, 2 replies; 14+ messages in thread
From: Nikolay Borisov @ 2017-05-11  7:53 UTC (permalink / raw)
  To: Trond Myklebust, torvalds; +Cc: linux-kernel, linux-nfs, Michal Hocko



On 10.05.2017 19:47, Trond Myklebust wrote:
> Hi Linus,
> 
> The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
> 
>   Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
> 
> are available in the git repository at:
> 
>   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-1
> 
> for you to fetch changes up to 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
> 
>   pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled (2017-05-09 16:02:57 -0400)
> 
> ----------------------------------------------------------------
> NFS client updates for Linux 4.12
> 
> Highlights include:
> 
> Stable bugfixes:
> - Fix use after free in write error path
> - Use GFP_NOIO for two allocations in writeback
> - Fix a hang in OPEN related to server reboot
> - Check the result of nfs4_pnfs_ds_connect
> - Fix an rcu lock leak
> 
> Features:
> - Removal of the unmaintained and unused OSD pNFS layout
> - Cleanup and removal of lots of unnecessary dprintk()s
> - Cleanup and removal of some memory failure paths now that
>   GFP_NOFS is guaranteed to never fail.

What guarantees that? Since if this is the case then this can result in
a lot of opportunities for cleanup across the whole kernel tree. After
discussing with mhocko (cc'ed) it seems that in practice everything
below COSTLY_ORDER which are not GFP_NORETRY will never fail. But this
semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
this will stay like that in the future?



[omitted for brevity]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11  7:53 ` Nikolay Borisov
@ 2017-05-11  7:59   ` Michal Hocko
  2017-05-11 12:16     ` Trond Myklebust
  2017-05-11 13:41   ` Trond Myklebust
  1 sibling, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11  7:59 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Trond Myklebust, torvalds, linux-kernel, linux-nfs

On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> 
> 
> On 10.05.2017 19:47, Trond Myklebust wrote:
[...]
> > - Cleanup and removal of some memory failure paths now that
> >   GFP_NOFS is guaranteed to never fail.
> 
> What guarantees that? Since if this is the case then this can result in
> a lot of opportunities for cleanup across the whole kernel tree. After
> discussing with mhocko (cc'ed) it seems that in practice everything
> below COSTLY_ORDER which are not GFP_NORETRY will never fail. But this
> semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> this will stay like that in the future?

In practice it is hard to change the semantic of small allocations never
fail _practically_. But this is absolutely not guaranteed! They can fail
e.g. when the allocation context is the oom victim. Removing error paths
for allocation failures is just wrong.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11  7:59   ` Michal Hocko
@ 2017-05-11 12:16     ` Trond Myklebust
  2017-05-11 12:26       ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 12:16 UTC (permalink / raw)
  To: mhocko, n.borisov.lkml; +Cc: torvalds, linux-kernel, linux-nfs

On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > 
> > 
> > On 10.05.2017 19:47, Trond Myklebust wrote:
> 
> [...]
> > > - Cleanup and removal of some memory failure paths now that
> > >   GFP_NOFS is guaranteed to never fail.
> > 
> > What guarantees that? Since if this is the case then this can
> > result in
> > a lot of opportunities for cleanup across the whole kernel tree.
> > After
> > discussing with mhocko (cc'ed) it seems that in practice everything
> > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > this
> > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > that
> > this will stay like that in the future?
> 
> In practice it is hard to change the semantic of small allocations
> never
> fail _practically_. But this is absolutely not guaranteed! They can
> fail
> e.g. when the allocation context is the oom victim. Removing error
> paths
> for allocation failures is just wrong.

OK, this makes no fucking sense at all.

Either allocations can fail or they can't.
1) If they can't fail, then we don't need the checks.
2) If they can fail, then we do need them, and this hand wringing in
the MM community about GFP_* semantics and how we need to prevent
failure is fucking pointless.

So which is it? (1) or (2)?




-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 12:16     ` Trond Myklebust
@ 2017-05-11 12:26       ` Michal Hocko
  2017-05-11 12:45         ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 12:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: n.borisov.lkml, torvalds, linux-kernel, linux-nfs

On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > 
> > > 
> > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > 
> > [...]
> > > > - Cleanup and removal of some memory failure paths now that
> > > >   GFP_NOFS is guaranteed to never fail.
> > > 
> > > What guarantees that? Since if this is the case then this can
> > > result in
> > > a lot of opportunities for cleanup across the whole kernel tree.
> > > After
> > > discussing with mhocko (cc'ed) it seems that in practice everything
> > > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > > this
> > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > that
> > > this will stay like that in the future?
> > 
> > In practice it is hard to change the semantic of small allocations
> > never
> > fail _practically_. But this is absolutely not guaranteed! They can
> > fail
> > e.g. when the allocation context is the oom victim. Removing error
> > paths
> > for allocation failures is just wrong.
> 
> OK, this makes no fucking sense at all.
> 
> Either allocations can fail or they can't.
> 1) If they can't fail, then we don't need the checks.
> 2) If they can fail, then we do need them, and this hand wringing in
> the MM community about GFP_* semantics and how we need to prevent
> failure is fucking pointless.

everything which is not __GFP_NOFAIL might fail. We try hard not to fail
small allocations requests as much as we can in general but you _have_ to
check for failures. There is simply no way to guarantee "never fail"
semantic for all allocation requests. This has been like that basically
since years. And even this try-to-be-nofailing for small allocations has
been PITA for some corner cases.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 12:26       ` Michal Hocko
@ 2017-05-11 12:45         ` Trond Myklebust
  2017-05-11 12:56           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 12:45 UTC (permalink / raw)
  To: mhocko; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml

On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > > 
> > > > 
> > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > 
> > > [...]
> > > > > - Cleanup and removal of some memory failure paths now that
> > > > >   GFP_NOFS is guaranteed to never fail.
> > > > 
> > > > What guarantees that? Since if this is the case then this can
> > > > result in
> > > > a lot of opportunities for cleanup across the whole kernel
> > > > tree.
> > > > After
> > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > everything
> > > > below COSTLY_ORDER which are not GFP_NORETRY will never fail.
> > > > But
> > > > this
> > > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > > that
> > > > this will stay like that in the future?
> > > 
> > > In practice it is hard to change the semantic of small
> > > allocations
> > > never
> > > fail _practically_. But this is absolutely not guaranteed! They
> > > can
> > > fail
> > > e.g. when the allocation context is the oom victim. Removing
> > > error
> > > paths
> > > for allocation failures is just wrong.
> > 
> > OK, this makes no fucking sense at all.
> > 
> > Either allocations can fail or they can't.
> > 1) If they can't fail, then we don't need the checks.
> > 2) If they can fail, then we do need them, and this hand wringing
> > in
> > the MM community about GFP_* semantics and how we need to prevent
> > failure is fucking pointless.
> 
> everything which is not __GFP_NOFAIL might fail. We try hard not to
> fail
> small allocations requests as much as we can in general but you
> _have_ to
> check for failures. There is simply no way to guarantee "never fail"
> semantic for all allocation requests. This has been like that
> basically
> since years. And even this try-to-be-nofailing for small allocations
> has
> been PITA for some corner cases.

I'll take that as a vote for (2), then.

I know that failures could occur in the past. That's why those code
paths were there. The problem is that the MM community has been making
lots of noise on mailing lists, conferences and LWN articles about how
we must not fail small allocations because the MM community believes
that nobody expects it. This is confusing everyone... It confused Neil
Brown, who contributed these patches, and it confused me and all the
other reviewers of these patches on the linux-nfs mailing list.

So if indeed (2) is correct, then please can we have a clear statement
_when discussing improvements to memory allocation semantics_ that
GFP_* still can fail, still will fail, and that callers should assume
it will fail and should test their code paths assuming the failure
case.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 12:45         ` Trond Myklebust
@ 2017-05-11 12:56           ` Michal Hocko
  2017-05-11 13:10             ` Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 12:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml

On Thu 11-05-17 12:45:00, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> > On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > > > 
> > > > > 
> > > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > > 
> > > > [...]
> > > > > > - Cleanup and removal of some memory failure paths now that
> > > > > >   GFP_NOFS is guaranteed to never fail.
> > > > > 
> > > > > What guarantees that? Since if this is the case then this can
> > > > > result in
> > > > > a lot of opportunities for cleanup across the whole kernel
> > > > > tree.
> > > > > After
> > > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > > everything
> > > > > below COSTLY_ORDER which are not GFP_NORETRY will never fail.
> > > > > But
> > > > > this
> > > > > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees
> > > > > that
> > > > > this will stay like that in the future?
> > > > 
> > > > In practice it is hard to change the semantic of small
> > > > allocations
> > > > never
> > > > fail _practically_. But this is absolutely not guaranteed! They
> > > > can
> > > > fail
> > > > e.g. when the allocation context is the oom victim. Removing
> > > > error
> > > > paths
> > > > for allocation failures is just wrong.
> > > 
> > > OK, this makes no fucking sense at all.
> > > 
> > > Either allocations can fail or they can't.
> > > 1) If they can't fail, then we don't need the checks.
> > > 2) If they can fail, then we do need them, and this hand wringing
> > > in
> > > the MM community about GFP_* semantics and how we need to prevent
> > > failure is fucking pointless.
> > 
> > everything which is not __GFP_NOFAIL might fail. We try hard not to
> > fail
> > small allocations requests as much as we can in general but you
> > _have_ to
> > check for failures. There is simply no way to guarantee "never fail"
> > semantic for all allocation requests. This has been like that
> > basically
> > since years. And even this try-to-be-nofailing for small allocations
> > has
> > been PITA for some corner cases.
> 
> I'll take that as a vote for (2), then.
> 
> I know that failures could occur in the past. That's why those code
> paths were there. The problem is that the MM community has been making
> lots of noise on mailing lists, conferences and LWN articles about how
> we must not fail small allocations because the MM community believes
> that nobody expects it. This is confusing everyone...

It was exactly other way around. We would like to _get_rid_of_ this do
not fail behavior because it is causing a major headaches in out of
memory corner cases. Just take GFP_NOFS as an example. It is a weak
reclaim context because we cannot reclaim fs metadata and that might be
a lot of memory so we cannot trigger the OOM killer and have to rely on
a different allocation context or kswapd to make a progress on our
behalf. We would really like to fail those requests instead. I've tried
that in the past but it was deemed to dangerous because _all_ kernel
paths would have to be checked for a sane failure behavior. So we are
keeping status quo instead.

> It confused Neil Brown, who contributed these patches, and it confused
> me and all the other reviewers of these patches on the linux-nfs
> mailing list.
> 
> So if indeed (2) is correct, then please can we have a clear statement
> _when discussing improvements to memory allocation semantics_ that
> GFP_* still can fail, still will fail, and that callers should assume
> it will fail and should test their code paths assuming the failure
> case.

I do not see any explicit documentation which would encourage users to
not check for the allocation failure. Only __GFP_NOFAIL is documented it
_must_ retry for ever. Of course I am open for any documentation
improvements.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 12:56           ` Michal Hocko
@ 2017-05-11 13:10             ` Trond Myklebust
  2017-05-11 13:27               ` Michal Hocko
  2017-05-16 15:15               ` Jonathan Corbet
  0 siblings, 2 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 13:10 UTC (permalink / raw)
  To: mhocko; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml

On Thu, 2017-05-11 at 14:56 +0200, Michal Hocko wrote:
> On Thu 11-05-17 12:45:00, Trond Myklebust wrote:
> > On Thu, 2017-05-11 at 14:26 +0200, Michal Hocko wrote:
> > > On Thu 11-05-17 12:16:37, Trond Myklebust wrote:
> > > > On Thu, 2017-05-11 at 09:59 +0200, Michal Hocko wrote:
> > > > > On Thu 11-05-17 10:53:27, Nikolay Borisov wrote:
> > > > > > 
> > > > > > 
> > > > > > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > > > 
> > > > > [...]
> > > > > > > - Cleanup and removal of some memory failure paths now
> > > > > > > that
> > > > > > >   GFP_NOFS is guaranteed to never fail.
> > > > > > 
> > > > > > What guarantees that? Since if this is the case then this
> > > > > > can
> > > > > > result in
> > > > > > a lot of opportunities for cleanup across the whole kernel
> > > > > > tree.
> > > > > > After
> > > > > > discussing with mhocko (cc'ed) it seems that in practice
> > > > > > everything
> > > > > > below COSTLY_ORDER which are not GFP_NORETRY will never
> > > > > > fail.
> > > > > > But
> > > > > > this
> > > > > > semantic is not the same as GFP_NOFAIL. E.g. nothing
> > > > > > guarantees
> > > > > > that
> > > > > > this will stay like that in the future?
> > > > > 
> > > > > In practice it is hard to change the semantic of small
> > > > > allocations
> > > > > never
> > > > > fail _practically_. But this is absolutely not guaranteed!
> > > > > They
> > > > > can
> > > > > fail
> > > > > e.g. when the allocation context is the oom victim. Removing
> > > > > error
> > > > > paths
> > > > > for allocation failures is just wrong.
> > > > 
> > > > OK, this makes no fucking sense at all.
> > > > 
> > > > Either allocations can fail or they can't.
> > > > 1) If they can't fail, then we don't need the checks.
> > > > 2) If they can fail, then we do need them, and this hand
> > > > wringing
> > > > in
> > > > the MM community about GFP_* semantics and how we need to
> > > > prevent
> > > > failure is fucking pointless.
> > > 
> > > everything which is not __GFP_NOFAIL might fail. We try hard not
> > > to
> > > fail
> > > small allocations requests as much as we can in general but you
> > > _have_ to
> > > check for failures. There is simply no way to guarantee "never
> > > fail"
> > > semantic for all allocation requests. This has been like that
> > > basically
> > > since years. And even this try-to-be-nofailing for small
> > > allocations
> > > has
> > > been PITA for some corner cases.
> > 
> > I'll take that as a vote for (2), then.
> > 
> > I know that failures could occur in the past. That's why those code
> > paths were there. The problem is that the MM community has been
> > making
> > lots of noise on mailing lists, conferences and LWN articles about
> > how
> > we must not fail small allocations because the MM community
> > believes
> > that nobody expects it. This is confusing everyone...
> 
> It was exactly other way around. We would like to _get_rid_of_ this
> do
> not fail behavior because it is causing a major headaches in out of
> memory corner cases. Just take GFP_NOFS as an example. It is a weak
> reclaim context because we cannot reclaim fs metadata and that might
> be
> a lot of memory so we cannot trigger the OOM killer and have to rely
> on
> a different allocation context or kswapd to make a progress on our
> behalf. We would really like to fail those requests instead. I've
> tried
> that in the past but it was deemed to dangerous because _all_ kernel
> paths would have to be checked for a sane failure behavior. So we are
> keeping status quo instead.

If we suspect the existence of a load of potential time bombs in the
kernel due to missing checks, then the status quo is not good enough.
We should be working on tools to identify these code paths.

Quite frankly, I'd love to see a fuzzer-like tool that can randomly
fail allocations. I can easily make one for the NFS code, but if there
is a general problem identifying buggy code, then perhaps it should be
solved at the MM layer itself.

> > It confused Neil Brown, who contributed these patches, and it
> > confused
> > me and all the other reviewers of these patches on the linux-nfs
> > mailing list.
> > 
> > So if indeed (2) is correct, then please can we have a clear
> > statement
> > _when discussing improvements to memory allocation semantics_ that
> > GFP_* still can fail, still will fail, and that callers should
> > assume
> > it will fail and should test their code paths assuming the failure
> > case.
> 
> I do not see any explicit documentation which would encourage users
> to
> not check for the allocation failure. Only __GFP_NOFAIL is documented
> it
> _must_ retry for ever. Of course I am open for any documentation
> improvements.

As I said, the problem has been the discussion, and how it focusses on
"must not fail".

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 13:10             ` Trond Myklebust
@ 2017-05-11 13:27               ` Michal Hocko
  2017-05-16 15:15               ` Jonathan Corbet
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 13:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: torvalds, linux-kernel, linux-nfs, n.borisov.lkml

On Thu 11-05-17 13:10:03, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 14:56 +0200, Michal Hocko wrote:
[...]
> > I do not see any explicit documentation which would encourage users
> > to not check for the allocation failure. Only __GFP_NOFAIL is
> > documented it _must_ retry for ever. Of course I am open for any
> > documentation improvements.
> 
> As I said, the problem has been the discussion, and how it focusses on
> "must not fail".

I would strongly suggest cc linux-mm ML when doing this kind of changes.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11  7:53 ` Nikolay Borisov
  2017-05-11  7:59   ` Michal Hocko
@ 2017-05-11 13:41   ` Trond Myklebust
  2017-05-11 13:54     ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2017-05-11 13:41 UTC (permalink / raw)
  To: torvalds, n.borisov.lkml; +Cc: linux-kernel, linux-nfs, mhocko

On Thu, 2017-05-11 at 10:53 +0300, Nikolay Borisov wrote:
> 
> On 10.05.2017 19:47, Trond Myklebust wrote:
> > Hi Linus,
> > 
> > The following changes since commit
> > 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
> > 
> >   Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
> > 
> > are available in the git repository at:
> > 
> >   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-
> > for-4.12-1
> > 
> > for you to fetch changes up to
> > 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
> > 
> >   pNFS/flexfiles: Always attempt to call layoutstats when flexfiles
> > is enabled (2017-05-09 16:02:57 -0400)
> > 
> > ----------------------------------------------------------------
> > NFS client updates for Linux 4.12
> > 
> > Highlights include:
> > 
> > Stable bugfixes:
> > - Fix use after free in write error path
> > - Use GFP_NOIO for two allocations in writeback
> > - Fix a hang in OPEN related to server reboot
> > - Check the result of nfs4_pnfs_ds_connect
> > - Fix an rcu lock leak
> > 
> > Features:
> > - Removal of the unmaintained and unused OSD pNFS layout
> > - Cleanup and removal of lots of unnecessary dprintk()s
> > - Cleanup and removal of some memory failure paths now that
> >   GFP_NOFS is guaranteed to never fail.
> 
> What guarantees that? Since if this is the case then this can result
> in
> a lot of opportunities for cleanup across the whole kernel tree.
> After
> discussing with mhocko (cc'ed) it seems that in practice everything
> below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> this
> semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> this will stay like that in the future?
> 

Actually, going back to the code with coffee: it's the fact we have
mempools, with direct reclaim that guarantee this.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 13:41   ` Trond Myklebust
@ 2017-05-11 13:54     ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2017-05-11 13:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: torvalds, n.borisov.lkml, linux-kernel, linux-nfs

On Thu 11-05-17 13:41:12, Trond Myklebust wrote:
> On Thu, 2017-05-11 at 10:53 +0300, Nikolay Borisov wrote:
> > 
> > On 10.05.2017 19:47, Trond Myklebust wrote:
> > > Hi Linus,
> > > 
> > > The following changes since commit
> > > 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
> > > 
> > >   Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
> > > 
> > > are available in the git repository at:
> > > 
> > >   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-
> > > for-4.12-1
> > > 
> > > for you to fetch changes up to
> > > 76b2a303384e1d6299c3a0249f0f0ce2f8f96017:
> > > 
> > >   pNFS/flexfiles: Always attempt to call layoutstats when flexfiles
> > > is enabled (2017-05-09 16:02:57 -0400)
> > > 
> > > ----------------------------------------------------------------
> > > NFS client updates for Linux 4.12
> > > 
> > > Highlights include:
> > > 
> > > Stable bugfixes:
> > > - Fix use after free in write error path
> > > - Use GFP_NOIO for two allocations in writeback
> > > - Fix a hang in OPEN related to server reboot
> > > - Check the result of nfs4_pnfs_ds_connect
> > > - Fix an rcu lock leak
> > > 
> > > Features:
> > > - Removal of the unmaintained and unused OSD pNFS layout
> > > - Cleanup and removal of lots of unnecessary dprintk()s
> > > - Cleanup and removal of some memory failure paths now that
> > >   GFP_NOFS is guaranteed to never fail.
> > 
> > What guarantees that? Since if this is the case then this can result
> > in
> > a lot of opportunities for cleanup across the whole kernel tree.
> > After
> > discussing with mhocko (cc'ed) it seems that in practice everything
> > below COSTLY_ORDER which are not GFP_NORETRY will never fail. But
> > this
> > semantic is not the same as GFP_NOFAIL. E.g. nothing guarantees that
> > this will stay like that in the future?
> > 
> 
> Actually, going back to the code with coffee: it's the fact we have
> mempools, with direct reclaim that guarantee this.

Mempools are a different story, of course. They do the their own loop on
top of the underlying allocator.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [GIT PULL] Please pull NFS client fixes for 4.12
  2017-05-11 13:10             ` Trond Myklebust
  2017-05-11 13:27               ` Michal Hocko
@ 2017-05-16 15:15               ` Jonathan Corbet
  1 sibling, 0 replies; 14+ messages in thread
From: Jonathan Corbet @ 2017-05-16 15:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: mhocko, torvalds, linux-kernel, linux-nfs, n.borisov.lkml

On Thu, 11 May 2017 13:10:03 +0000
Trond Myklebust <trondmy@primarydata.com> wrote:

> If we suspect the existence of a load of potential time bombs in the
> kernel due to missing checks, then the status quo is not good enough.
> We should be working on tools to identify these code paths.
> 
> Quite frankly, I'd love to see a fuzzer-like tool that can randomly
> fail allocations.

The fault injection framework at least used to work quite nicely for this;
I used it to test out the failure paths in the OLPC drivers years ago.
Documentation/fault-injection/fault-injection.txt.

jon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [GIT PULL] Please pull NFS client fixes for 4.12
@ 2017-06-28 14:19 Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2017-06-28 14:19 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, linux-nfs

Hi Linus,

The following changes since commit 3c2993b8c6143d8a5793746a54eba8f86f95240f:

  Linux 4.12-rc4 (2017-06-04 16:47:43 -0700)

are available in the git repository at:

  git://git.linux-nfs.org/projects/trondmy/linux-nfs.git tags/nfs-for-4.12-3

for you to fetch changes up to 2e31b4cb895ae78db31dffb860cd255d86c6561c:

  NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() (2017-06-27 22:26:23 -0400)

----------------------------------------------------------------
NFS client bugfixes for Linux 4.12

Bugfixes include:

- Stable fix for exclusive create if the server supports the umask attribute
- Trunking detection should handle ERESTARTSYS/EINTR
- Stable fix for a race in the LAYOUTGET function
- Stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left behind"
- nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()

----------------------------------------------------------------
Benjamin Coddington (2):
      NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask
      Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind"

Trond Myklebust (3):
      NFS: Trunking detection should handle ERESTARTSYS/EINTR
      NFSv4.1: Fix a race in nfs4_proc_layoutget
      NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()

 fs/nfs/callback_xdr.c |  1 -
 fs/nfs/dir.c          | 51 ++++++++++++++++++++++++---------------------------
 fs/nfs/nfs4proc.c     |  5 +++--
 fs/nfs/nfs4state.c    |  2 ++
 4 files changed, 29 insertions(+), 30 deletions(-)
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-28 14:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10 16:47 [GIT PULL] Please pull NFS client fixes for 4.12 Trond Myklebust
2017-05-10 20:06 ` Linus Torvalds
2017-05-11  7:53 ` Nikolay Borisov
2017-05-11  7:59   ` Michal Hocko
2017-05-11 12:16     ` Trond Myklebust
2017-05-11 12:26       ` Michal Hocko
2017-05-11 12:45         ` Trond Myklebust
2017-05-11 12:56           ` Michal Hocko
2017-05-11 13:10             ` Trond Myklebust
2017-05-11 13:27               ` Michal Hocko
2017-05-16 15:15               ` Jonathan Corbet
2017-05-11 13:41   ` Trond Myklebust
2017-05-11 13:54     ` Michal Hocko
2017-06-28 14:19 Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).