* blocklayout GETDEVICEINFO fix @ 2014-09-26 14:02 Christoph Hellwig 2014-09-26 14:02 ` [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2014-09-26 14:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs The rpc_pipefs pattern used for the GETDEVICEINFO implementation in the blocklayout driver is fundamentally not thread safe. Workloads like dbench manage to trigger concurrent GETDEVICEINFO calls though, so we need to add a critical section around it. I also wonder if we should avoid even sending multiple GETDEVICEINFO a a higher level in the NFS client, as ѕending multiple request just to cancel out their action before adding them to the cache doesn't seem very helpful. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 14:02 blocklayout GETDEVICEINFO fix Christoph Hellwig @ 2014-09-26 14:02 ` Christoph Hellwig 2014-09-26 14:29 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2014-09-26 14:02 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs The rpc_pipefs code isn't thread safe, leading to occasional use after frees when running xfstests generic/241 (dbench). Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/nfs/blocklayout/rpc_pipefs.c | 14 +++++++++----- fs/nfs/netns.h | 1 + 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/nfs/blocklayout/rpc_pipefs.c b/fs/nfs/blocklayout/rpc_pipefs.c index 8d04bda..da58ff7 100644 --- a/fs/nfs/blocklayout/rpc_pipefs.c +++ b/fs/nfs/blocklayout/rpc_pipefs.c @@ -65,17 +65,18 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, dprintk("%s CREATING PIPEFS MESSAGE\n", __func__); + mutex_lock(&nn->bl_mutex); bl_pipe_msg.bl_wq = &nn->bl_wq; b->simple.len += 4; /* single volume */ if (b->simple.len > PAGE_SIZE) - return -EIO; + goto out_unlock; memset(msg, 0, sizeof(*msg)); msg->len = sizeof(*bl_msg) + b->simple.len; msg->data = kzalloc(msg->len, gfp_mask); if (!msg->data) - goto out; + goto out_free_data; bl_msg = msg->data; bl_msg->type = BL_DEVICE_MOUNT, @@ -87,7 +88,7 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, rc = rpc_queue_upcall(nn->bl_device_pipe, msg); if (rc < 0) { remove_wait_queue(&nn->bl_wq, &wq); - goto out; + goto out_free_data; } set_current_state(TASK_UNINTERRUPTIBLE); @@ -98,12 +99,14 @@ bl_resolve_deviceid(struct nfs_server *server, struct pnfs_block_volume *b, if (reply->status != BL_DEVICE_REQUEST_PROC) { printk(KERN_WARNING "%s failed to decode device: %d\n", __func__, reply->status); - goto out; + goto out_free_data; } dev = MKDEV(reply->major, reply->minor); -out: +out_free_data: kfree(msg->data); +out_unlock: + mutex_unlock(&nn->bl_mutex); return dev; } @@ -233,6 +236,7 @@ static int nfs4blocklayout_net_init(struct net *net) struct nfs_net *nn = net_generic(net, nfs_net_id); struct dentry *dentry; + mutex_init(&nn->bl_mutex); init_waitqueue_head(&nn->bl_wq); nn->bl_device_pipe = rpc_mkpipe_data(&bl_upcall_ops, 0); if (IS_ERR(nn->bl_device_pipe)) diff --git a/fs/nfs/netns.h b/fs/nfs/netns.h index ef221fb..f0e06e4 100644 --- a/fs/nfs/netns.h +++ b/fs/nfs/netns.h @@ -19,6 +19,7 @@ struct nfs_net { struct rpc_pipe *bl_device_pipe; struct bl_dev_msg bl_mount_reply; wait_queue_head_t bl_wq; + struct mutex bl_mutex; struct list_head nfs_client_list; struct list_head nfs_volume_list; #if IS_ENABLED(CONFIG_NFS_V4) -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 14:02 ` [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls Christoph Hellwig @ 2014-09-26 14:29 ` Trond Myklebust 2014-09-26 15:48 ` Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2014-09-26 14:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux NFS Mailing List On Fri, Sep 26, 2014 at 10:02 AM, Christoph Hellwig <hch@lst.de> wrote: > The rpc_pipefs code isn't thread safe, leading to occasional use after > frees when running xfstests generic/241 (dbench). > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/nfs/blocklayout/rpc_pipefs.c | 14 +++++++++----- > fs/nfs/netns.h | 1 + > 2 files changed, 10 insertions(+), 5 deletions(-) > > It worries me that we're putting a mutex directly in the writeback path. For small arrays, it might be acceptable, but what if you have a block device with 1000s of disks on the back end? Is there no better way to fix this issue? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 14:29 ` Trond Myklebust @ 2014-09-26 15:48 ` Christoph Hellwig 2014-09-26 16:21 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2014-09-26 15:48 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Fri, Sep 26, 2014 at 10:29:34AM -0400, Trond Myklebust wrote: > It worries me that we're putting a mutex directly in the writeback > path. For small arrays, it might be acceptable, but what if you have a > block device with 1000s of disks on the back end? > > Is there no better way to fix this issue? Not without getting rid of the rpc_pipefs interface. That is on my very long term TODO list, but it will require new userspace support. Note that I'm actually worried about GETDEVICEINFO from the writeback path in general. There is a lot that happens when we don't have a device in cache, including the need to open a block device for the block layout driver, which is a complex operation full of GFP_KERNEL allocation, or even a more complex scsi device scan for the object layout. It's been on my more near term todo list to look into reproducers for deadlocks in this area which seem very possible, and then look into a fix for it; I can't really think of anything less drastic than refusing block or object layout I/O from memory reclaim if we don't have the device cached yet. The situation for file layouts seems less severe, so I'll need help from people more familar with to think about the situation there. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 15:48 ` Christoph Hellwig @ 2014-09-26 16:21 ` Trond Myklebust 2014-09-26 16:41 ` Christoph Hellwig 2014-10-24 14:29 ` Christoph Hellwig 0 siblings, 2 replies; 8+ messages in thread From: Trond Myklebust @ 2014-09-26 16:21 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Linux NFS Mailing List On Fri, Sep 26, 2014 at 11:48 AM, Christoph Hellwig <hch@lst.de> wrote: > On Fri, Sep 26, 2014 at 10:29:34AM -0400, Trond Myklebust wrote: >> It worries me that we're putting a mutex directly in the writeback >> path. For small arrays, it might be acceptable, but what if you have a >> block device with 1000s of disks on the back end? >> >> Is there no better way to fix this issue? > > Not without getting rid of the rpc_pipefs interface. That is on my > very long term TODO list, but it will require new userspace support. Why is that? rpc_pipefs was designed to be message based, so it should work quite well in a multi-threaded environment. We certainly don't use mutexes around the gssd up/downcall, and the only reason for the mutex in idmapd is to deal with the keyring upcall. > Note that I'm actually worried about GETDEVICEINFO from the writeback > path in general. There is a lot that happens when we don't have > a device in cache, including the need to open a block device for > the block layout driver, which is a complex operation full of > GFP_KERNEL allocation, or even a more complex scsi device scan > for the object layout. It's been on my more near term todo list > to look into reproducers for deadlocks in this area which seem > very possible, and then look into a fix for it; I can't really > think of anything less drastic than refusing block or object layout > I/O from memory reclaim if we don't have the device cached yet. > The situation for file layouts seems less severe, so I'll need > help from people more familar with to think about the situation there. Agreed, -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 16:21 ` Trond Myklebust @ 2014-09-26 16:41 ` Christoph Hellwig 2014-10-24 14:29 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2014-09-26 16:41 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Fri, Sep 26, 2014 at 12:21:06PM -0400, Trond Myklebust wrote: > > Not without getting rid of the rpc_pipefs interface. That is on my > > very long term TODO list, but it will require new userspace support. > > Why is that? rpc_pipefs was designed to be message based, so it should > work quite well in a multi-threaded environment. We certainly don't > use mutexes around the gssd up/downcall, and the only reason for the > mutex in idmapd is to deal with the keyring upcall. Ok, the GSS code dynamically allocates a new message for each upcall and thus doesn't have the issue. I'll take a look if I can do this in the blocklayout driver as well. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-09-26 16:21 ` Trond Myklebust 2014-09-26 16:41 ` Christoph Hellwig @ 2014-10-24 14:29 ` Christoph Hellwig 2014-11-05 8:30 ` Christoph Hellwig 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2014-10-24 14:29 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Fri, Sep 26, 2014 at 12:21:06PM -0400, Trond Myklebust wrote: > > On Fri, Sep 26, 2014 at 10:29:34AM -0400, Trond Myklebust wrote: > >> It worries me that we're putting a mutex directly in the writeback > >> path. For small arrays, it might be acceptable, but what if you have a > >> block device with 1000s of disks on the back end? > >> > >> Is there no better way to fix this issue? > > > > Not without getting rid of the rpc_pipefs interface. That is on my > > very long term TODO list, but it will require new userspace support. > > Why is that? rpc_pipefs was designed to be message based, so it should > work quite well in a multi-threaded environment. We certainly don't > use mutexes around the gssd up/downcall, and the only reason for the > mutex in idmapd is to deal with the keyring upcall. Turns out we can't do anything like that with the existing blkmapd ABI. The other callers that support concurrency have some sort of uniqueue identifier embedded in their messages both on the upcall and the downcall, so that the kernel can find the right structure on the downcall side. Because of that I'd like to request inclusion of this patch with a Cc to stable for 3.17. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls 2014-10-24 14:29 ` Christoph Hellwig @ 2014-11-05 8:30 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2014-11-05 8:30 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Fri, Oct 24, 2014 at 04:29:39PM +0200, Christoph Hellwig wrote: > Turns out we can't do anything like that with the existing blkmapd > ABI. The other callers that support concurrency have some sort > of uniqueue identifier embedded in their messages both on the upcall > and the downcall, so that the kernel can find the right structure > on the downcall side. > > Because of that I'd like to request inclusion of this patch with a Cc > to stable for 3.17. ping? This is an easily expoitable oops, so I'd really like to see the/a fix in 3.18-rc and 3.17-stable. Thanks, Christoph ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-11-05 8:30 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-26 14:02 blocklayout GETDEVICEINFO fix Christoph Hellwig 2014-09-26 14:02 ` [PATCH] pnfs/blocklayout: serialize GETDEVICEINFO calls Christoph Hellwig 2014-09-26 14:29 ` Trond Myklebust 2014-09-26 15:48 ` Christoph Hellwig 2014-09-26 16:21 ` Trond Myklebust 2014-09-26 16:41 ` Christoph Hellwig 2014-10-24 14:29 ` Christoph Hellwig 2014-11-05 8:30 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.