All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: xfs@oss.sgi.com
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: netdev@vger.kernel.org
Subject: [PATCH/RFC 00/19] Support loop-back NFS mounts
Date: Wed, 16 Apr 2014 14:03:35 +1000	[thread overview]
Message-ID: <20140416033623.10604.69237.stgit@notabene.brown> (raw)

Loop-back NFS mounts are when the NFS client and server run on the
same host.

The use-case for this is a high availability cluster with shared
storage.  The shared filesystem is mounted on any one machine and
NFS-mounted on the others.
If the nfs server fails, some other node will take over that service,
and then it will have a loop-back NFS mount which needs to keep
working.

This patch set addresses the "keep working" bit and specifically
addresses deadlocks and livelocks.
Allowing the fail-over itself to be deadlock free is a separate
challenge for another day.

The short description of how this works is:

deadlocks:
  - Elevate PF_FSTRANS to apply globally instead of just in NFS and XFS.
    PF_FSTRANS disables __GFP_NS in the same way that PF_MEMALLOC_NOIO
    disables __GFP_IO.
  - Set PF_FSTRANS in nfsd when handling requests related to
    memory reclaim, or requests which could block requests related
    to memory reclaim.
  - Use lockdep to find all consequent deadlocks from some other
    thread allocating memory while holding a lock that nfsd might
    want.
  - Fix those other deadlocks by setting PF_FSTRANS or using GFP_NOFS
    as appropriate.

livelocks:
  - identify throttling during reclaim and bypass it when
    PF_LESS_THROTTLE is set
  - only set PF_LESS_THROTTLE for nfsd when handling write requests
    from the local host.

The last 12 patches address various deadlocks due to locking chains.
11 were found by lockdep, 2 by testing.  There is a reasonable chance
that there are more, I just need to exercise more code while
testing....

There is one issue that lockdep reports which I haven't fixed (I've
just hacked the code out for my testing).  That issue relates to
freeze_super().
I may not be interpreting the lockdep reports perfectly, but I think
they are basically saying that if I were to freeze a filesystem that
was exported to the local host, then we could end up deadlocking.
This is to be expected.  The NFS filesystem would need to be frozen
first.  I don't know how to tell lockdep that I know that is a problem
and I don't want to be warned about it.  Suggestions welcome.
Until this is addressed I cannot really ask others to test the code
with lockdep enabled.

There are more subsidiary places that I needed to add PF_FSTRANS than
I would have liked.  The thought keeps crossing my mind that maybe we
can get rid of __GFP_FS and require that memory reclaim never ever
block on a filesystem.  Then most of these patches go away.

Now that writeback doesn't happen from reclaim (but from kswapd) much
of the calls from reclaim to FS are gone.
The ->releasepage call is the only one that I *know* causes me
problems so I'd like to just say that that must never block.  I don't
really understand the consequences of that though.
There are a couple of other places where __GFP_FS is used and I'd need
to carefully analyze those.  But if someone just said "no, that is
impossible", I could be happy and stick with the current approach....

I've cc:ed Peter Zijlstra and Ingo Molnar only on the lockdep-related
patches, Ming Lei only on the PF_MEMALLOC_NOIO related patches,
and net-dev only on the network-related patches.
There are probably other people I should CC.  Apologies if I missed you.
I'll ensure better coverage if the nfs/mm/xfs people are reasonably happy.

Comments, criticisms, etc most welcome.

Thanks,
NeilBrown


---

NeilBrown (19):
      Promote current_{set,restore}_flags_nested from xfs to global.
      lockdep: lockdep_set_current_reclaim_state should save old value
      lockdep: improve scenario messages for RECLAIM_FS errors.
      Make effect of PF_FSTRANS to disable __GFP_FS universal.
      SUNRPC: track whether a request is coming from a loop-back interface.
      nfsd: set PF_FSTRANS for nfsd threads.
      nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list.
      Set PF_FSTRANS while write_cache_pages calls ->writepage
      XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.
      NET: set PF_FSTRANS while holding sk_lock
      FS: set PF_FSTRANS while holding mmap_sem in exec.c
      NET: set PF_FSTRANS while holding rtnl_lock
      MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.
      driver core: set PF_FSTRANS while holding gdp_mutex
      nfsd: set PF_FSTRANS when client_mutex is held.
      VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
      VFS: set PF_FSTRANS while namespace_sem is held.
      nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.
      XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks


 drivers/base/core.c             |    3 ++
 drivers/base/power/runtime.c    |    6 ++---
 drivers/block/nbd.c             |    6 ++---
 drivers/md/dm-bufio.c           |    6 ++---
 drivers/md/dm-ioctl.c           |    6 ++---
 drivers/mtd/nand/nandsim.c      |   28 ++++++---------------
 drivers/scsi/iscsi_tcp.c        |    6 ++---
 drivers/usb/core/hub.c          |    6 ++---
 fs/dcache.c                     |    4 ++-
 fs/exec.c                       |    6 +++++
 fs/fs-writeback.c               |    5 ++--
 fs/namespace.c                  |    4 +++
 fs/nfs/file.c                   |    3 +-
 fs/nfsd/nfs4callback.c          |    5 ++++
 fs/nfsd/nfs4state.c             |    3 ++
 fs/nfsd/nfssvc.c                |   24 ++++++++++++++----
 fs/nfsd/vfs.c                   |    6 +++++
 fs/xfs/kmem.h                   |    2 --
 fs/xfs/xfs_aops.c               |    7 -----
 fs/xfs/xfs_bmap_util.c          |    4 +++
 fs/xfs/xfs_file.c               |   12 +++++++++
 fs/xfs/xfs_linux.h              |    7 -----
 include/linux/lockdep.h         |    8 +++---
 include/linux/sched.h           |   32 +++++++++---------------
 include/linux/sunrpc/svc.h      |    2 ++
 include/linux/sunrpc/svc_xprt.h |    1 +
 include/net/sock.h              |    1 +
 kernel/locking/lockdep.c        |   51 ++++++++++++++++++++++++++++-----------
 kernel/softirq.c                |    6 ++---
 mm/migrate.c                    |    9 +++----
 mm/page-writeback.c             |    3 ++
 mm/page_alloc.c                 |   18 ++++++++------
 mm/percpu.c                     |    4 +++
 mm/slab.c                       |    2 ++
 mm/slob.c                       |    2 ++
 mm/slub.c                       |    1 +
 mm/vmscan.c                     |   31 +++++++++++++++---------
 net/core/dev.c                  |    6 ++---
 net/core/rtnetlink.c            |    9 ++++++-
 net/core/sock.c                 |    8 ++++--
 net/sunrpc/sched.c              |    5 ++--
 net/sunrpc/svc.c                |    6 +++++
 net/sunrpc/svcsock.c            |   10 ++++++++
 net/sunrpc/xprtrdma/transport.c |    5 ++--
 net/sunrpc/xprtsock.c           |   17 ++++++++-----
 45 files changed, 247 insertions(+), 149 deletions(-)

-- 
Signature


WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.de>
To: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: xfs@oss.sgi.com
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: netdev@vger.kernel.org
Subject: [PATCH/RFC 00/19] Support loop-back NFS mounts
Date: Wed, 16 Apr 2014 14:03:35 +1000	[thread overview]
Message-ID: <20140416033623.10604.69237.stgit@notabene.brown> (raw)

Loop-back NFS mounts are when the NFS client and server run on the
same host.

The use-case for this is a high availability cluster with shared
storage.  The shared filesystem is mounted on any one machine and
NFS-mounted on the others.
If the nfs server fails, some other node will take over that service,
and then it will have a loop-back NFS mount which needs to keep
working.

This patch set addresses the "keep working" bit and specifically
addresses deadlocks and livelocks.
Allowing the fail-over itself to be deadlock free is a separate
challenge for another day.

The short description of how this works is:

deadlocks:
  - Elevate PF_FSTRANS to apply globally instead of just in NFS and XFS.
    PF_FSTRANS disables __GFP_NS in the same way that PF_MEMALLOC_NOIO
    disables __GFP_IO.
  - Set PF_FSTRANS in nfsd when handling requests related to
    memory reclaim, or requests which could block requests related
    to memory reclaim.
  - Use lockdep to find all consequent deadlocks from some other
    thread allocating memory while holding a lock that nfsd might
    want.
  - Fix those other deadlocks by setting PF_FSTRANS or using GFP_NOFS
    as appropriate.

livelocks:
  - identify throttling during reclaim and bypass it when
    PF_LESS_THROTTLE is set
  - only set PF_LESS_THROTTLE for nfsd when handling write requests
    from the local host.

The last 12 patches address various deadlocks due to locking chains.
11 were found by lockdep, 2 by testing.  There is a reasonable chance
that there are more, I just need to exercise more code while
testing....

There is one issue that lockdep reports which I haven't fixed (I've
just hacked the code out for my testing).  That issue relates to
freeze_super().
I may not be interpreting the lockdep reports perfectly, but I think
they are basically saying that if I were to freeze a filesystem that
was exported to the local host, then we could end up deadlocking.
This is to be expected.  The NFS filesystem would need to be frozen
first.  I don't know how to tell lockdep that I know that is a problem
and I don't want to be warned about it.  Suggestions welcome.
Until this is addressed I cannot really ask others to test the code
with lockdep enabled.

There are more subsidiary places that I needed to add PF_FSTRANS than
I would have liked.  The thought keeps crossing my mind that maybe we
can get rid of __GFP_FS and require that memory reclaim never ever
block on a filesystem.  Then most of these patches go away.

Now that writeback doesn't happen from reclaim (but from kswapd) much
of the calls from reclaim to FS are gone.
The ->releasepage call is the only one that I *know* causes me
problems so I'd like to just say that that must never block.  I don't
really understand the consequences of that though.
There are a couple of other places where __GFP_FS is used and I'd need
to carefully analyze those.  But if someone just said "no, that is
impossible", I could be happy and stick with the current approach....

I've cc:ed Peter Zijlstra and Ingo Molnar only on the lockdep-related
patches, Ming Lei only on the PF_MEMALLOC_NOIO related patches,
and net-dev only on the network-related patches.
There are probably other people I should CC.  Apologies if I missed you.
I'll ensure better coverage if the nfs/mm/xfs people are reasonably happy.

Comments, criticisms, etc most welcome.

Thanks,
NeilBrown


---

NeilBrown (19):
      Promote current_{set,restore}_flags_nested from xfs to global.
      lockdep: lockdep_set_current_reclaim_state should save old value
      lockdep: improve scenario messages for RECLAIM_FS errors.
      Make effect of PF_FSTRANS to disable __GFP_FS universal.
      SUNRPC: track whether a request is coming from a loop-back interface.
      nfsd: set PF_FSTRANS for nfsd threads.
      nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list.
      Set PF_FSTRANS while write_cache_pages calls ->writepage
      XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.
      NET: set PF_FSTRANS while holding sk_lock
      FS: set PF_FSTRANS while holding mmap_sem in exec.c
      NET: set PF_FSTRANS while holding rtnl_lock
      MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.
      driver core: set PF_FSTRANS while holding gdp_mutex
      nfsd: set PF_FSTRANS when client_mutex is held.
      VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
      VFS: set PF_FSTRANS while namespace_sem is held.
      nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.
      XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks


 drivers/base/core.c             |    3 ++
 drivers/base/power/runtime.c    |    6 ++---
 drivers/block/nbd.c             |    6 ++---
 drivers/md/dm-bufio.c           |    6 ++---
 drivers/md/dm-ioctl.c           |    6 ++---
 drivers/mtd/nand/nandsim.c      |   28 ++++++---------------
 drivers/scsi/iscsi_tcp.c        |    6 ++---
 drivers/usb/core/hub.c          |    6 ++---
 fs/dcache.c                     |    4 ++-
 fs/exec.c                       |    6 +++++
 fs/fs-writeback.c               |    5 ++--
 fs/namespace.c                  |    4 +++
 fs/nfs/file.c                   |    3 +-
 fs/nfsd/nfs4callback.c          |    5 ++++
 fs/nfsd/nfs4state.c             |    3 ++
 fs/nfsd/nfssvc.c                |   24 ++++++++++++++----
 fs/nfsd/vfs.c                   |    6 +++++
 fs/xfs/kmem.h                   |    2 --
 fs/xfs/xfs_aops.c               |    7 -----
 fs/xfs/xfs_bmap_util.c          |    4 +++
 fs/xfs/xfs_file.c               |   12 +++++++++
 fs/xfs/xfs_linux.h              |    7 -----
 include/linux/lockdep.h         |    8 +++---
 include/linux/sched.h           |   32 +++++++++---------------
 include/linux/sunrpc/svc.h      |    2 ++
 include/linux/sunrpc/svc_xprt.h |    1 +
 include/net/sock.h              |    1 +
 kernel/locking/lockdep.c        |   51 ++++++++++++++++++++++++++++-----------
 kernel/softirq.c                |    6 ++---
 mm/migrate.c                    |    9 +++----
 mm/page-writeback.c             |    3 ++
 mm/page_alloc.c                 |   18 ++++++++------
 mm/percpu.c                     |    4 +++
 mm/slab.c                       |    2 ++
 mm/slob.c                       |    2 ++
 mm/slub.c                       |    1 +
 mm/vmscan.c                     |   31 +++++++++++++++---------
 net/core/dev.c                  |    6 ++---
 net/core/rtnetlink.c            |    9 ++++++-
 net/core/sock.c                 |    8 ++++--
 net/sunrpc/sched.c              |    5 ++--
 net/sunrpc/svc.c                |    6 +++++
 net/sunrpc/svcsock.c            |   10 ++++++++
 net/sunrpc/xprtrdma/transport.c |    5 ++--
 net/sunrpc/xprtsock.c           |   17 ++++++++-----
 45 files changed, 247 insertions(+), 149 deletions(-)

-- 
Signature

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.de>
To: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	netdev@vger.kernel.org, Ming Lei <ming.lei@canonical.com>,
	Ingo Molnar <mingo@redhat.com>,
	xfs@oss.sgi.com
Subject: [PATCH/RFC 00/19] Support loop-back NFS mounts
Date: Wed, 16 Apr 2014 14:03:35 +1000	[thread overview]
Message-ID: <20140416033623.10604.69237.stgit@notabene.brown> (raw)

Loop-back NFS mounts are when the NFS client and server run on the
same host.

The use-case for this is a high availability cluster with shared
storage.  The shared filesystem is mounted on any one machine and
NFS-mounted on the others.
If the nfs server fails, some other node will take over that service,
and then it will have a loop-back NFS mount which needs to keep
working.

This patch set addresses the "keep working" bit and specifically
addresses deadlocks and livelocks.
Allowing the fail-over itself to be deadlock free is a separate
challenge for another day.

The short description of how this works is:

deadlocks:
  - Elevate PF_FSTRANS to apply globally instead of just in NFS and XFS.
    PF_FSTRANS disables __GFP_NS in the same way that PF_MEMALLOC_NOIO
    disables __GFP_IO.
  - Set PF_FSTRANS in nfsd when handling requests related to
    memory reclaim, or requests which could block requests related
    to memory reclaim.
  - Use lockdep to find all consequent deadlocks from some other
    thread allocating memory while holding a lock that nfsd might
    want.
  - Fix those other deadlocks by setting PF_FSTRANS or using GFP_NOFS
    as appropriate.

livelocks:
  - identify throttling during reclaim and bypass it when
    PF_LESS_THROTTLE is set
  - only set PF_LESS_THROTTLE for nfsd when handling write requests
    from the local host.

The last 12 patches address various deadlocks due to locking chains.
11 were found by lockdep, 2 by testing.  There is a reasonable chance
that there are more, I just need to exercise more code while
testing....

There is one issue that lockdep reports which I haven't fixed (I've
just hacked the code out for my testing).  That issue relates to
freeze_super().
I may not be interpreting the lockdep reports perfectly, but I think
they are basically saying that if I were to freeze a filesystem that
was exported to the local host, then we could end up deadlocking.
This is to be expected.  The NFS filesystem would need to be frozen
first.  I don't know how to tell lockdep that I know that is a problem
and I don't want to be warned about it.  Suggestions welcome.
Until this is addressed I cannot really ask others to test the code
with lockdep enabled.

There are more subsidiary places that I needed to add PF_FSTRANS than
I would have liked.  The thought keeps crossing my mind that maybe we
can get rid of __GFP_FS and require that memory reclaim never ever
block on a filesystem.  Then most of these patches go away.

Now that writeback doesn't happen from reclaim (but from kswapd) much
of the calls from reclaim to FS are gone.
The ->releasepage call is the only one that I *know* causes me
problems so I'd like to just say that that must never block.  I don't
really understand the consequences of that though.
There are a couple of other places where __GFP_FS is used and I'd need
to carefully analyze those.  But if someone just said "no, that is
impossible", I could be happy and stick with the current approach....

I've cc:ed Peter Zijlstra and Ingo Molnar only on the lockdep-related
patches, Ming Lei only on the PF_MEMALLOC_NOIO related patches,
and net-dev only on the network-related patches.
There are probably other people I should CC.  Apologies if I missed you.
I'll ensure better coverage if the nfs/mm/xfs people are reasonably happy.

Comments, criticisms, etc most welcome.

Thanks,
NeilBrown


---

NeilBrown (19):
      Promote current_{set,restore}_flags_nested from xfs to global.
      lockdep: lockdep_set_current_reclaim_state should save old value
      lockdep: improve scenario messages for RECLAIM_FS errors.
      Make effect of PF_FSTRANS to disable __GFP_FS universal.
      SUNRPC: track whether a request is coming from a loop-back interface.
      nfsd: set PF_FSTRANS for nfsd threads.
      nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list.
      Set PF_FSTRANS while write_cache_pages calls ->writepage
      XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.
      NET: set PF_FSTRANS while holding sk_lock
      FS: set PF_FSTRANS while holding mmap_sem in exec.c
      NET: set PF_FSTRANS while holding rtnl_lock
      MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.
      driver core: set PF_FSTRANS while holding gdp_mutex
      nfsd: set PF_FSTRANS when client_mutex is held.
      VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
      VFS: set PF_FSTRANS while namespace_sem is held.
      nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.
      XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks


 drivers/base/core.c             |    3 ++
 drivers/base/power/runtime.c    |    6 ++---
 drivers/block/nbd.c             |    6 ++---
 drivers/md/dm-bufio.c           |    6 ++---
 drivers/md/dm-ioctl.c           |    6 ++---
 drivers/mtd/nand/nandsim.c      |   28 ++++++---------------
 drivers/scsi/iscsi_tcp.c        |    6 ++---
 drivers/usb/core/hub.c          |    6 ++---
 fs/dcache.c                     |    4 ++-
 fs/exec.c                       |    6 +++++
 fs/fs-writeback.c               |    5 ++--
 fs/namespace.c                  |    4 +++
 fs/nfs/file.c                   |    3 +-
 fs/nfsd/nfs4callback.c          |    5 ++++
 fs/nfsd/nfs4state.c             |    3 ++
 fs/nfsd/nfssvc.c                |   24 ++++++++++++++----
 fs/nfsd/vfs.c                   |    6 +++++
 fs/xfs/kmem.h                   |    2 --
 fs/xfs/xfs_aops.c               |    7 -----
 fs/xfs/xfs_bmap_util.c          |    4 +++
 fs/xfs/xfs_file.c               |   12 +++++++++
 fs/xfs/xfs_linux.h              |    7 -----
 include/linux/lockdep.h         |    8 +++---
 include/linux/sched.h           |   32 +++++++++---------------
 include/linux/sunrpc/svc.h      |    2 ++
 include/linux/sunrpc/svc_xprt.h |    1 +
 include/net/sock.h              |    1 +
 kernel/locking/lockdep.c        |   51 ++++++++++++++++++++++++++++-----------
 kernel/softirq.c                |    6 ++---
 mm/migrate.c                    |    9 +++----
 mm/page-writeback.c             |    3 ++
 mm/page_alloc.c                 |   18 ++++++++------
 mm/percpu.c                     |    4 +++
 mm/slab.c                       |    2 ++
 mm/slob.c                       |    2 ++
 mm/slub.c                       |    1 +
 mm/vmscan.c                     |   31 +++++++++++++++---------
 net/core/dev.c                  |    6 ++---
 net/core/rtnetlink.c            |    9 ++++++-
 net/core/sock.c                 |    8 ++++--
 net/sunrpc/sched.c              |    5 ++--
 net/sunrpc/svc.c                |    6 +++++
 net/sunrpc/svcsock.c            |   10 ++++++++
 net/sunrpc/xprtrdma/transport.c |    5 ++--
 net/sunrpc/xprtsock.c           |   17 ++++++++-----
 45 files changed, 247 insertions(+), 149 deletions(-)

-- 
Signature

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.de>
To: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: xfs@oss.sgi.com, Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Ming Lei <ming.lei@canonical.com>,
	netdev@vger.kernel.org
Subject: [PATCH/RFC 00/19] Support loop-back NFS mounts
Date: Wed, 16 Apr 2014 14:03:35 +1000	[thread overview]
Message-ID: <20140416033623.10604.69237.stgit@notabene.brown> (raw)

Loop-back NFS mounts are when the NFS client and server run on the
same host.

The use-case for this is a high availability cluster with shared
storage.  The shared filesystem is mounted on any one machine and
NFS-mounted on the others.
If the nfs server fails, some other node will take over that service,
and then it will have a loop-back NFS mount which needs to keep
working.

This patch set addresses the "keep working" bit and specifically
addresses deadlocks and livelocks.
Allowing the fail-over itself to be deadlock free is a separate
challenge for another day.

The short description of how this works is:

deadlocks:
  - Elevate PF_FSTRANS to apply globally instead of just in NFS and XFS.
    PF_FSTRANS disables __GFP_NS in the same way that PF_MEMALLOC_NOIO
    disables __GFP_IO.
  - Set PF_FSTRANS in nfsd when handling requests related to
    memory reclaim, or requests which could block requests related
    to memory reclaim.
  - Use lockdep to find all consequent deadlocks from some other
    thread allocating memory while holding a lock that nfsd might
    want.
  - Fix those other deadlocks by setting PF_FSTRANS or using GFP_NOFS
    as appropriate.

livelocks:
  - identify throttling during reclaim and bypass it when
    PF_LESS_THROTTLE is set
  - only set PF_LESS_THROTTLE for nfsd when handling write requests
    from the local host.

The last 12 patches address various deadlocks due to locking chains.
11 were found by lockdep, 2 by testing.  There is a reasonable chance
that there are more, I just need to exercise more code while
testing....

There is one issue that lockdep reports which I haven't fixed (I've
just hacked the code out for my testing).  That issue relates to
freeze_super().
I may not be interpreting the lockdep reports perfectly, but I think
they are basically saying that if I were to freeze a filesystem that
was exported to the local host, then we could end up deadlocking.
This is to be expected.  The NFS filesystem would need to be frozen
first.  I don't know how to tell lockdep that I know that is a problem
and I don't want to be warned about it.  Suggestions welcome.
Until this is addressed I cannot really ask others to test the code
with lockdep enabled.

There are more subsidiary places that I needed to add PF_FSTRANS than
I would have liked.  The thought keeps crossing my mind that maybe we
can get rid of __GFP_FS and require that memory reclaim never ever
block on a filesystem.  Then most of these patches go away.

Now that writeback doesn't happen from reclaim (but from kswapd) much
of the calls from reclaim to FS are gone.
The ->releasepage call is the only one that I *know* causes me
problems so I'd like to just say that that must never block.  I don't
really understand the consequences of that though.
There are a couple of other places where __GFP_FS is used and I'd need
to carefully analyze those.  But if someone just said "no, that is
impossible", I could be happy and stick with the current approach....

I've cc:ed Peter Zijlstra and Ingo Molnar only on the lockdep-related
patches, Ming Lei only on the PF_MEMALLOC_NOIO related patches,
and net-dev only on the network-related patches.
There are probably other people I should CC.  Apologies if I missed you.
I'll ensure better coverage if the nfs/mm/xfs people are reasonably happy.

Comments, criticisms, etc most welcome.

Thanks,
NeilBrown


---

NeilBrown (19):
      Promote current_{set,restore}_flags_nested from xfs to global.
      lockdep: lockdep_set_current_reclaim_state should save old value
      lockdep: improve scenario messages for RECLAIM_FS errors.
      Make effect of PF_FSTRANS to disable __GFP_FS universal.
      SUNRPC: track whether a request is coming from a loop-back interface.
      nfsd: set PF_FSTRANS for nfsd threads.
      nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list.
      Set PF_FSTRANS while write_cache_pages calls ->writepage
      XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.
      NET: set PF_FSTRANS while holding sk_lock
      FS: set PF_FSTRANS while holding mmap_sem in exec.c
      NET: set PF_FSTRANS while holding rtnl_lock
      MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.
      driver core: set PF_FSTRANS while holding gdp_mutex
      nfsd: set PF_FSTRANS when client_mutex is held.
      VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
      VFS: set PF_FSTRANS while namespace_sem is held.
      nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.
      XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks


 drivers/base/core.c             |    3 ++
 drivers/base/power/runtime.c    |    6 ++---
 drivers/block/nbd.c             |    6 ++---
 drivers/md/dm-bufio.c           |    6 ++---
 drivers/md/dm-ioctl.c           |    6 ++---
 drivers/mtd/nand/nandsim.c      |   28 ++++++---------------
 drivers/scsi/iscsi_tcp.c        |    6 ++---
 drivers/usb/core/hub.c          |    6 ++---
 fs/dcache.c                     |    4 ++-
 fs/exec.c                       |    6 +++++
 fs/fs-writeback.c               |    5 ++--
 fs/namespace.c                  |    4 +++
 fs/nfs/file.c                   |    3 +-
 fs/nfsd/nfs4callback.c          |    5 ++++
 fs/nfsd/nfs4state.c             |    3 ++
 fs/nfsd/nfssvc.c                |   24 ++++++++++++++----
 fs/nfsd/vfs.c                   |    6 +++++
 fs/xfs/kmem.h                   |    2 --
 fs/xfs/xfs_aops.c               |    7 -----
 fs/xfs/xfs_bmap_util.c          |    4 +++
 fs/xfs/xfs_file.c               |   12 +++++++++
 fs/xfs/xfs_linux.h              |    7 -----
 include/linux/lockdep.h         |    8 +++---
 include/linux/sched.h           |   32 +++++++++---------------
 include/linux/sunrpc/svc.h      |    2 ++
 include/linux/sunrpc/svc_xprt.h |    1 +
 include/net/sock.h              |    1 +
 kernel/locking/lockdep.c        |   51 ++++++++++++++++++++++++++++-----------
 kernel/softirq.c                |    6 ++---
 mm/migrate.c                    |    9 +++----
 mm/page-writeback.c             |    3 ++
 mm/page_alloc.c                 |   18 ++++++++------
 mm/percpu.c                     |    4 +++
 mm/slab.c                       |    2 ++
 mm/slob.c                       |    2 ++
 mm/slub.c                       |    1 +
 mm/vmscan.c                     |   31 +++++++++++++++---------
 net/core/dev.c                  |    6 ++---
 net/core/rtnetlink.c            |    9 ++++++-
 net/core/sock.c                 |    8 ++++--
 net/sunrpc/sched.c              |    5 ++--
 net/sunrpc/svc.c                |    6 +++++
 net/sunrpc/svcsock.c            |   10 ++++++++
 net/sunrpc/xprtrdma/transport.c |    5 ++--
 net/sunrpc/xprtsock.c           |   17 ++++++++-----
 45 files changed, 247 insertions(+), 149 deletions(-)

-- 
Signature

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2014-04-16  4:17 UTC|newest]

Thread overview: 151+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-16  4:03 NeilBrown [this message]
2014-04-16  4:03 ` [PATCH/RFC 00/19] Support loop-back NFS mounts NeilBrown
2014-04-16  4:03 ` NeilBrown
2014-04-16  4:03 ` NeilBrown
2014-04-16  4:03 ` [PATCH 03/19] lockdep: improve scenario messages for RECLAIM_FS errors NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  7:22   ` Peter Zijlstra
2014-04-16  7:22     ` Peter Zijlstra
2014-04-16  7:22     ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 06/19] nfsd: set PF_FSTRANS for nfsd threads NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  7:28   ` Peter Zijlstra
2014-04-16  7:28     ` Peter Zijlstra
2014-04-16  7:28     ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 14/19] driver core: set PF_FSTRANS while holding gdp_mutex NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 07/19] nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 11/19] FS: set PF_FSTRANS while holding mmap_sem in exec.c NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 01/19] Promote current_{set, restore}_flags_nested from xfs to global NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 04/19] Make effect of PF_FSTRANS to disable __GFP_FS universal NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:37   ` Dave Chinner
2014-04-16  5:37     ` Dave Chinner
2014-04-16  5:37     ` Dave Chinner
2014-04-16  6:17     ` NeilBrown
2014-04-16  6:17       ` NeilBrown
2014-04-17  1:03       ` NeilBrown
2014-04-17  1:03         ` NeilBrown
2014-04-17  4:41         ` Dave Chinner
2014-04-17  4:41           ` Dave Chinner
2014-04-17  4:41           ` Dave Chinner
2014-04-16  4:03 ` [PATCH 09/19] XFS: ensure xfs_file_*_read cannot deadlock in memory allocation NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:04   ` Dave Chinner
2014-04-16  6:04     ` Dave Chinner
2014-04-16  6:04     ` Dave Chinner
2014-04-16  6:27     ` NeilBrown
2014-04-16  6:27       ` NeilBrown
2014-04-16  6:31     ` Dave Chinner
2014-04-16  6:31       ` Dave Chinner
2014-04-16  6:31       ` Dave Chinner
2014-04-16  4:03 ` [PATCH 05/19] SUNRPC: track whether a request is coming from a loop-back interface NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16 14:47   ` Jeff Layton
2014-04-16 14:47     ` Jeff Layton
2014-04-16 14:47     ` Jeff Layton
2014-04-16 23:25     ` NeilBrown
2014-04-16 23:25       ` NeilBrown
2014-04-16  4:03 ` [PATCH 02/19] lockdep: lockdep_set_current_reclaim_state should save old value NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:13   ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:13     ` Eric Dumazet
2014-04-16  5:47     ` NeilBrown
2014-04-16  5:47       ` NeilBrown
2014-04-16  5:47       ` NeilBrown
2014-04-16 13:00     ` David Miller
2014-04-16 13:00       ` David Miller
2014-04-16 13:00       ` David Miller
2014-04-17  2:38       ` NeilBrown
2014-04-17  2:38         ` NeilBrown
2014-04-16  4:03 ` [PATCH 13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  5:49   ` Dave Chinner
2014-04-16  5:49     ` Dave Chinner
2014-04-16  5:49     ` Dave Chinner
2014-04-16  6:22     ` NeilBrown
2014-04-16  6:22       ` NeilBrown
2014-04-16  6:30       ` Dave Chinner
2014-04-16  6:30         ` Dave Chinner
2014-04-16  6:30         ` Dave Chinner
2014-04-16  4:03 ` [PATCH 12/19] NET: set PF_FSTRANS while holding rtnl_lock NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 08/19] Set PF_FSTRANS while write_cache_pages calls ->writepage NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 18/19] nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem is held NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:46   ` Al Viro
2014-04-16  4:46     ` Al Viro
2014-04-16  5:52     ` NeilBrown
2014-04-16  5:52       ` NeilBrown
2014-04-16 16:37       ` Al Viro
2014-04-16 16:37         ` Al Viro
2014-04-16 16:37         ` Al Viro
2014-04-16  4:03 ` [PATCH 15/19] nfsd: set PF_FSTRANS when client_mutex " NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03 ` [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:25   ` Dave Chinner
2014-04-16  6:25     ` Dave Chinner
2014-04-16  6:25     ` Dave Chinner
2014-04-16  6:49     ` NeilBrown
2014-04-16  6:49       ` NeilBrown
2014-04-16  9:00       ` Dave Chinner
2014-04-16  9:00         ` Dave Chinner
2014-04-16  9:00         ` Dave Chinner
2014-04-17  0:51         ` NeilBrown
2014-04-17  0:51           ` NeilBrown
2014-04-17  5:58           ` Dave Chinner
2014-04-17  5:58             ` Dave Chinner
2014-04-17  5:58             ` Dave Chinner
2014-04-16  4:03 ` [PATCH 19/19] XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  4:03   ` NeilBrown
2014-04-16  6:18   ` Dave Chinner
2014-04-16  6:18     ` Dave Chinner
2014-04-16  6:18     ` Dave Chinner
2014-04-16 14:42 ` [PATCH/RFC 00/19] Support loop-back NFS mounts Jeff Layton
2014-04-16 14:42   ` Jeff Layton
2014-04-16 14:42   ` Jeff Layton
2014-04-17  0:20   ` NeilBrown
2014-04-17  0:20     ` NeilBrown
2014-04-17  0:20     ` NeilBrown
2014-04-17  1:27     ` Dave Chinner
2014-04-17  1:27       ` Dave Chinner
2014-04-17  1:27       ` Dave Chinner
2014-04-17  1:50       ` NeilBrown
2014-04-17  1:50         ` NeilBrown
2014-04-17  1:50         ` NeilBrown
2014-04-17  4:23         ` Dave Chinner
2014-04-17  4:23           ` Dave Chinner
2014-04-17  4:23           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140416033623.10604.69237.stgit@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.