All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] replacement for "introduce pnfs private workqueue"
@ 2011-09-20  3:18 Jim Rees
  2011-09-20  3:18 ` [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put Jim Rees
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Jim Rees @ 2011-09-20  3:18 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

The first patch is a bug fix that has already been sent upstream.  The next
two patches should replace the following in your tree:

0359f0e pNFS: introduce pnfs private workqueue
d9b5876 SQUASHME: pnfs: simplify and clean up pnfsiod workqueue
ef9e1a70 SQUASHME: pnfs: do pnfsiod_start before registering layout drivers

This will re-order the patches so that the bug fix is first, followed by the
workqueue patches.

These are also available on the for-benny branch of my git repo.

Benny Halevy (1):
  SQUASHME: pnfs: simplify and clean up pnfsiod workqueue

Peng Tao (2):
  pnfsblock: add missing rpc_put_mount and path_put
  pnfs: introduce pnfs private workqueue

 fs/nfs/blocklayout/blocklayout.c |   21 +++++++++++---
 fs/nfs/objlayout/objio_osd.c     |   10 ++++++-
 fs/nfs/objlayout/objlayout.c     |    4 +-
 fs/nfs/pnfs.c                    |   54 +++++++++++++++++++++++++++++++++++++-
 fs/nfs/pnfs.h                    |    4 +++
 5 files changed, 84 insertions(+), 9 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put
  2011-09-20  3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
@ 2011-09-20  3:18 ` Jim Rees
  2011-09-20  3:18 ` [PATCH 2/3] pnfs: introduce pnfs private workqueue Jim Rees
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Jim Rees @ 2011-09-20  3:18 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
---
 fs/nfs/blocklayout/blocklayout.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index dc23833..dee6cae 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -993,17 +993,20 @@ static int __init nfs4blocklayout_init(void)
 			      mnt,
 			      NFS_PIPE_DIRNAME, 0, &path);
 	if (ret)
-		goto out_remove;
+		goto out_putrpc;
 
 	bl_device_pipe = rpc_mkpipe(path.dentry, "blocklayout", NULL,
 				    &bl_upcall_ops, 0);
+	path_put(&path);
 	if (IS_ERR(bl_device_pipe)) {
 		ret = PTR_ERR(bl_device_pipe);
-		goto out_remove;
+		goto out_putrpc;
 	}
 out:
 	return ret;
 
+out_putrpc:
+	rpc_put_mount();
 out_remove:
 	pnfs_unregister_layoutdriver(&blocklayout_type);
 	return ret;
@@ -1016,6 +1019,7 @@ static void __exit nfs4blocklayout_exit(void)
 
 	pnfs_unregister_layoutdriver(&blocklayout_type);
 	rpc_unlink(bl_device_pipe);
+	rpc_put_mount();
 }
 
 MODULE_ALIAS("nfs-layouttype4-3");
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-20  3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
  2011-09-20  3:18 ` [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put Jim Rees
@ 2011-09-20  3:18 ` Jim Rees
  2011-09-20 22:41   ` Trond Myklebust
  2011-09-20  3:18 ` [PATCH 3/3] SQUASHME: pnfs: simplify and clean up pnfsiod workqueue Jim Rees
  2011-09-21 11:52 ` [PATCH 0/3] replacement for "introduce pnfs private workqueue" Benny Halevy
  3 siblings, 1 reply; 29+ messages in thread
From: Jim Rees @ 2011-09-20  3:18 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

For layoutdriver io done functions, default workqueue is not a good place as
the code is executed in IO path. So add a pnfs private workqueue to handle
them.

Also change block and object layout code to make use of this private
workqueue.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfs: do pnfsiod_start before registering layout drivers]
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/blocklayout/blocklayout.c |   13 +++++++--
 fs/nfs/objlayout/objio_osd.c     |   10 ++++++-
 fs/nfs/objlayout/objlayout.c     |    4 +-
 fs/nfs/pnfs.c                    |   52 +++++++++++++++++++++++++++++++++++++-
 fs/nfs/pnfs.h                    |    4 +++
 5 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index dee6cae..0356016 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -228,7 +228,7 @@ bl_end_par_io_read(void *data)
 	struct nfs_read_data *rdata = data;
 
 	INIT_WORK(&rdata->task.u.tk_work, bl_read_cleanup);
-	schedule_work(&rdata->task.u.tk_work);
+	pnfsiod_queue_work(&rdata->task.u.tk_work);
 }
 
 /* We don't want normal .rpc_call_done callback used, so we replace it
@@ -418,7 +418,7 @@ static void bl_end_par_io_write(void *data)
 	wdata->task.tk_status = 0;
 	wdata->verf.committed = NFS_FILE_SYNC;
 	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
-	schedule_work(&wdata->task.u.tk_work);
+	pnfsiod_queue_work(&wdata->task.u.tk_work);
 }
 
 /* FIXME STUB - mark intersection of layout and page as bad, so is not
@@ -977,10 +977,14 @@ static int __init nfs4blocklayout_init(void)
 
 	dprintk("%s: NFSv4 Block Layout Driver Registering...\n", __func__);
 
-	ret = pnfs_register_layoutdriver(&blocklayout_type);
+	ret = pnfsiod_start();
 	if (ret)
 		goto out;
 
+	ret = pnfs_register_layoutdriver(&blocklayout_type);
+	if (ret)
+		goto out_stop;
+
 	init_waitqueue_head(&bl_wq);
 
 	mnt = rpc_get_mount();
@@ -1009,6 +1013,8 @@ out_putrpc:
 	rpc_put_mount();
 out_remove:
 	pnfs_unregister_layoutdriver(&blocklayout_type);
+out_stop:
+	pnfsiod_stop();
 	return ret;
 }
 
@@ -1018,6 +1024,7 @@ static void __exit nfs4blocklayout_exit(void)
 	       __func__);
 
 	pnfs_unregister_layoutdriver(&blocklayout_type);
+	pnfsiod_stop();
 	rpc_unlink(bl_device_pipe);
 	rpc_put_mount();
 }
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index d0cda12..7e8f0cc 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -1041,7 +1041,14 @@ MODULE_LICENSE("GPL");
 static int __init
 objlayout_init(void)
 {
-	int ret = pnfs_register_layoutdriver(&objlayout_type);
+	int ret;
+
+	ret = pnfsiod_start();
+	if (!ret) {
+		ret = pnfs_register_layoutdriver(&objlayout_type);
+		if (ret)
+			pnfsiod_stop();
+	}
 
 	if (ret)
 		printk(KERN_INFO
@@ -1057,6 +1064,7 @@ static void __exit
 objlayout_exit(void)
 {
 	pnfs_unregister_layoutdriver(&objlayout_type);
+	pnfsiod_stop();
 	printk(KERN_INFO "%s: Unregistered OSD pNFS Layout Driver\n",
 	       __func__);
 }
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 1d06f8e..f7c6c21 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -305,7 +305,7 @@ objlayout_read_done(struct objlayout_io_state *state, ssize_t status, bool sync)
 		pnfs_ld_read_done(rdata);
 	else {
 		INIT_WORK(&rdata->task.u.tk_work, _rpc_read_complete);
-		schedule_work(&rdata->task.u.tk_work);
+		pnfsiod_queue_work(&rdata->task.u.tk_work);
 	}
 }
 
@@ -396,7 +396,7 @@ objlayout_write_done(struct objlayout_io_state *state, ssize_t status,
 		pnfs_ld_write_done(wdata);
 	else {
 		INIT_WORK(&wdata->task.u.tk_work, _rpc_write_complete);
-		schedule_work(&wdata->task.u.tk_work);
+		pnfsiod_queue_work(&wdata->task.u.tk_work);
 	}
 }
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e550e88..5ac7a78 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -38,7 +38,7 @@
 /* Locking:
  *
  * pnfs_spinlock:
- *      protects pnfs_modules_tbl.
+ *      protects pnfs_modules_tbl, pnfsiod_workqueue and pnfsiod_users.
  */
 static DEFINE_SPINLOCK(pnfs_spinlock);
 
@@ -47,6 +47,9 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
  */
 static LIST_HEAD(pnfs_modules_tbl);
 
+static struct workqueue_struct *pnfsiod_workqueue;
+static int pnfsiod_users = 0;
+
 /* Return the registered pnfs layout driver module matching given id */
 static struct pnfs_layoutdriver_type *
 find_pnfs_driver_locked(u32 id)
@@ -1478,3 +1481,50 @@ out:
 	dprintk("<-- %s status %d\n", __func__, status);
 	return status;
 }
+
+/*
+ * start up the pnfsiod workqueue
+ */
+int pnfsiod_start(void)
+{
+	struct workqueue_struct *wq;
+	dprintk("RPC:       creating workqueue pnfsiod\n");
+	wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
+	if (wq == NULL)
+		return -ENOMEM;
+	spin_lock(&pnfs_spinlock);
+	pnfsiod_users++;
+	if (pnfsiod_workqueue == NULL) {
+		pnfsiod_workqueue = wq;
+	} else {
+		destroy_workqueue(wq);
+	}
+	spin_unlock(&pnfs_spinlock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnfsiod_start);
+
+/*
+ * Destroy the pnfsiod workqueue
+ */
+void pnfsiod_stop(void)
+{
+	struct workqueue_struct *wq = NULL;
+
+	spin_lock(&pnfs_spinlock);
+	pnfsiod_users--;
+	if (pnfsiod_users == 0) {
+		wq = pnfsiod_workqueue;
+		pnfsiod_workqueue = NULL;
+	}
+	spin_unlock(&pnfs_spinlock);
+	if (wq)
+		destroy_workqueue(wq);
+}
+EXPORT_SYMBOL_GPL(pnfsiod_stop);
+
+void pnfsiod_queue_work(struct work_struct* work)
+{
+	queue_work(pnfsiod_workqueue, work);
+}
+EXPORT_SYMBOL_GPL(pnfsiod_queue_work);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 01cbfd5..bc1eed5 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -165,6 +165,10 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
 extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
 
 /* pnfs.c */
+int pnfsiod_start(void);
+void pnfsiod_stop(void);
+void pnfsiod_queue_work(struct work_struct* work);
+
 void get_layout_hdr(struct pnfs_layout_hdr *lo);
 void put_lseg(struct pnfs_layout_segment *lseg);
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] SQUASHME: pnfs: simplify and clean up pnfsiod workqueue
  2011-09-20  3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
  2011-09-20  3:18 ` [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put Jim Rees
  2011-09-20  3:18 ` [PATCH 2/3] pnfs: introduce pnfs private workqueue Jim Rees
@ 2011-09-20  3:18 ` Jim Rees
  2011-09-21 11:52 ` [PATCH 0/3] replacement for "introduce pnfs private workqueue" Benny Halevy
  3 siblings, 0 replies; 29+ messages in thread
From: Jim Rees @ 2011-09-20  3:18 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

From: Benny Halevy <bhalevy@tonian.com>

[whitespace]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
---
 fs/nfs/pnfs.c |   34 ++++++++++++++++++----------------
 1 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 5ac7a78..e38b91a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -38,7 +38,7 @@
 /* Locking:
  *
  * pnfs_spinlock:
- *      protects pnfs_modules_tbl, pnfsiod_workqueue and pnfsiod_users.
+ *      protects pnfs_modules_tbl, pnfsiod_workqueue.
  */
 static DEFINE_SPINLOCK(pnfs_spinlock);
 
@@ -48,7 +48,7 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
 static LIST_HEAD(pnfs_modules_tbl);
 
 static struct workqueue_struct *pnfsiod_workqueue;
-static int pnfsiod_users = 0;
+static atomic_t pnfsiod_users;
 
 /* Return the registered pnfs layout driver module matching given id */
 static struct pnfs_layoutdriver_type *
@@ -1488,18 +1488,22 @@ out:
 int pnfsiod_start(void)
 {
 	struct workqueue_struct *wq;
-	dprintk("RPC:       creating workqueue pnfsiod\n");
-	wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
-	if (wq == NULL)
-		return -ENOMEM;
-	spin_lock(&pnfs_spinlock);
-	pnfsiod_users++;
+
+	dprintk("NFS:       creating workqueue pnfsiod\n");
+	atomic_inc(&pnfsiod_users);
 	if (pnfsiod_workqueue == NULL) {
-		pnfsiod_workqueue = wq;
-	} else {
-		destroy_workqueue(wq);
+		wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
+		if (wq == NULL) {
+			pnfsiod_stop();
+			return -ENOMEM;
+		}
+		spin_lock(&pnfs_spinlock);
+		if (pnfsiod_workqueue == NULL)
+			pnfsiod_workqueue = wq;
+		else
+			destroy_workqueue(wq);
+		spin_unlock(&pnfs_spinlock);
 	}
-	spin_unlock(&pnfs_spinlock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pnfsiod_start);
@@ -1511,13 +1515,11 @@ void pnfsiod_stop(void)
 {
 	struct workqueue_struct *wq = NULL;
 
-	spin_lock(&pnfs_spinlock);
-	pnfsiod_users--;
-	if (pnfsiod_users == 0) {
+	if (atomic_dec_and_lock(&pnfsiod_users, &pnfs_spinlock)) {
 		wq = pnfsiod_workqueue;
 		pnfsiod_workqueue = NULL;
+		spin_unlock(&pnfs_spinlock);
 	}
-	spin_unlock(&pnfs_spinlock);
 	if (wq)
 		destroy_workqueue(wq);
 }
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-20  3:18 ` [PATCH 2/3] pnfs: introduce pnfs private workqueue Jim Rees
@ 2011-09-20 22:41   ` Trond Myklebust
  2011-09-21  0:29     ` Jim Rees
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2011-09-20 22:41 UTC (permalink / raw)
  To: Jim Rees; +Cc: Benny Halevy, linux-nfs, peter honeyman

On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote: 
> From: Peng Tao <bergwolf@gmail.com>
> 
> For layoutdriver io done functions, default workqueue is not a good place as
> the code is executed in IO path. So add a pnfs private workqueue to handle
> them.
> 
> Also change block and object layout code to make use of this private
> workqueue.
> 

Wait, what????

Why is the I/O path (i.e. the nfsiod queue) inappropriate for
layoutdriver io_done functions?

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-20 22:41   ` Trond Myklebust
@ 2011-09-21  0:29     ` Jim Rees
  2011-09-21  2:44       ` tao.peng
  2011-09-21  4:22       ` Myklebust, Trond
  0 siblings, 2 replies; 29+ messages in thread
From: Jim Rees @ 2011-09-21  0:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Benny Halevy, linux-nfs, peter honeyman

Trond Myklebust wrote:

  On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote: 
  > From: Peng Tao <bergwolf@gmail.com>
  > 
  > For layoutdriver io done functions, default workqueue is not a good place as
  > the code is executed in IO path. So add a pnfs private workqueue to handle
  > them.
  > 
  > Also change block and object layout code to make use of this private
  > workqueue.
  > 
  
  Wait, what????
  
  Why is the I/O path (i.e. the nfsiod queue) inappropriate for
  layoutdriver io_done functions?

I thought you were the one who asked for this, here:
http://www.spinics.net/lists/linux-nfs/msg22771.html

But looking back on it now, the IO path has changed and I can't tell if the
argument still holds.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  0:29     ` Jim Rees
@ 2011-09-21  2:44       ` tao.peng
  2011-09-21  4:20         ` Myklebust, Trond
  2011-09-21  4:22       ` Myklebust, Trond
  1 sibling, 1 reply; 29+ messages in thread
From: tao.peng @ 2011-09-21  2:44 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: bhalevy, linux-nfs, honey, rees


> -----Original Message-----
> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org]
> On Behalf Of Jim Rees
> Sent: Wednesday, September 21, 2011 8:29 AM
> To: Trond Myklebust
> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> Trond Myklebust wrote:
> 
>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>   > From: Peng Tao <bergwolf@gmail.com>
>   >
>   > For layoutdriver io done functions, default workqueue is not a good place as
>   > the code is executed in IO path. So add a pnfs private workqueue to handle
>   > them.
>   >
>   > Also change block and object layout code to make use of this private
>   > workqueue.
>   >
> 
>   Wait, what????
> 
>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>   layoutdriver io_done functions?
The current code uses the system_wq to handle io_done functions. If any function allocates memory in system_wq and causes dirty writeback in nfs path, the io_done function can never be called because it is after the original function in the system_wq. And you said that the rpciod/nfsiod is not the ideal place because of memory allocation. So I suggested pnfs private workqueue and Benny agreed on it.

> 
> I thought you were the one who asked for this, here:
> http://www.spinics.net/lists/linux-nfs/msg22771.html
> 
> But looking back on it now, the IO path has changed and I can't tell if the
> argument still holds.
I believe it still holds. The recollapse path still involves memory allocation.

Cheers,
Tao

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  2:44       ` tao.peng
@ 2011-09-21  4:20         ` Myklebust, Trond
  2011-09-21  5:16           ` tao.peng
  0 siblings, 1 reply; 29+ messages in thread
From: Myklebust, Trond @ 2011-09-21  4:20 UTC (permalink / raw)
  To: tao.peng; +Cc: bhalevy, linux-nfs, honey, rees

> -----Original Message-----
> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
> Sent: Tuesday, September 20, 2011 10:44 PM
> To: Myklebust, Trond
> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
honey@citi.umich.edu;
> rees@umich.edu
> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> 
> > -----Original Message-----
> > From: linux-nfs-owner@vger.kernel.org
> > [mailto:linux-nfs-owner@vger.kernel.org]
> > On Behalf Of Jim Rees
> > Sent: Wednesday, September 21, 2011 8:29 AM
> > To: Trond Myklebust
> > Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> > Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >
> > Trond Myklebust wrote:
> >
> >   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> >   > From: Peng Tao <bergwolf@gmail.com>
> >   >
> >   > For layoutdriver io done functions, default workqueue is not a
good
> place as
> >   > the code is executed in IO path. So add a pnfs private workqueue
to
> handle
> >   > them.
> >   >
> >   > Also change block and object layout code to make use of this
private
> >   > workqueue.
> >   >
> >
> >   Wait, what????
> >
> >   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> >   layoutdriver io_done functions?
> The current code uses the system_wq to handle io_done functions. If
any
> function allocates memory in system_wq and causes dirty writeback in
nfs
> path, the io_done function can never be called because it is after the
original
> function in the system_wq. And you said that the rpciod/nfsiod is not
the
> ideal place because of memory allocation. So I suggested pnfs private
> workqueue and Benny agreed on it.

You appear to be optimizing for a corner case. Why?

IOW: why do we need a whole new workqueue for something which is
supposed to be extremely rare? Why can't we just use standard keventd or
allocate a perfectly normal thread (e.g. the state recovery thread)?

Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  0:29     ` Jim Rees
  2011-09-21  2:44       ` tao.peng
@ 2011-09-21  4:22       ` Myklebust, Trond
  1 sibling, 0 replies; 29+ messages in thread
From: Myklebust, Trond @ 2011-09-21  4:22 UTC (permalink / raw)
  To: Jim Rees; +Cc: Benny Halevy, linux-nfs, peter honeyman

> -----Original Message-----
> From: Jim Rees [mailto:rees@umich.edu]
> Sent: Tuesday, September 20, 2011 8:29 PM
> To: Myklebust, Trond
> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> Trond Myklebust wrote:
> 
>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>   > From: Peng Tao <bergwolf@gmail.com>
>   >
>   > For layoutdriver io done functions, default workqueue is not a
good place
> as
>   > the code is executed in IO path. So add a pnfs private workqueue
to
> handle
>   > them.
>   >
>   > Also change block and object layout code to make use of this
private
>   > workqueue.
>   >
> 
>   Wait, what????
> 
>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>   layoutdriver io_done functions?
> 
> I thought you were the one who asked for this, here:
> http://www.spinics.net/lists/linux-nfs/msg22771.html
> 
> But looking back on it now, the IO path has changed and I can't tell
if the
> argument still holds.

You are reading too much into what I said. The fact that we shouldn't
use the 'hot' nfs/rpciod workqueues doesn't imply that we need a
completely separate pnfs workqueue. We already have plenty of generic
workqueues and even context threads: why aren't they sufficient?

Trond

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  4:20         ` Myklebust, Trond
@ 2011-09-21  5:16           ` tao.peng
  2011-09-21  7:04             ` Benny Halevy
  0 siblings, 1 reply; 29+ messages in thread
From: tao.peng @ 2011-09-21  5:16 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: bhalevy, linux-nfs, honey, rees

> -----Original Message-----
> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
> Sent: Wednesday, September 21, 2011 12:20 PM
> To: Peng, Tao
> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
> rees@umich.edu
> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> > -----Original Message-----
> > From: tao.peng@emc.com [mailto:tao.peng@emc.com]
> > Sent: Tuesday, September 20, 2011 10:44 PM
> > To: Myklebust, Trond
> > Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
> honey@citi.umich.edu;
> > rees@umich.edu
> > Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >
> >
> > > -----Original Message-----
> > > From: linux-nfs-owner@vger.kernel.org
> > > [mailto:linux-nfs-owner@vger.kernel.org]
> > > On Behalf Of Jim Rees
> > > Sent: Wednesday, September 21, 2011 8:29 AM
> > > To: Trond Myklebust
> > > Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> > > Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> > >
> > > Trond Myklebust wrote:
> > >
> > >   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> > >   > From: Peng Tao <bergwolf@gmail.com>
> > >   >
> > >   > For layoutdriver io done functions, default workqueue is not a
> good
> > place as
> > >   > the code is executed in IO path. So add a pnfs private workqueue
> to
> > handle
> > >   > them.
> > >   >
> > >   > Also change block and object layout code to make use of this
> private
> > >   > workqueue.
> > >   >
> > >
> > >   Wait, what????
> > >
> > >   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> > >   layoutdriver io_done functions?
> > The current code uses the system_wq to handle io_done functions. If
> any
> > function allocates memory in system_wq and causes dirty writeback in
> nfs
> > path, the io_done function can never be called because it is after the
> original
> > function in the system_wq. And you said that the rpciod/nfsiod is not
> the
> > ideal place because of memory allocation. So I suggested pnfs private
> > workqueue and Benny agreed on it.
> 
> You appear to be optimizing for a corner case. Why?
Current code uses system_wq for io_done and it can cause deadlock. So this is more than just an optimization.

> 
> IOW: why do we need a whole new workqueue for something which is
> supposed to be extremely rare? Why can't we just use standard keventd or
> allocate a perfectly normal thread (e.g. the state recovery thread)?
Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be invoked during memory reclaim. And I agree that a normal kthread can solve it.

However, the blocklayout write end_io function still needs a context where it can allocate memory to convert extent state (for every IO that touches new extents). If you look into the patch, we are not just using the pnfs private workqueue to handle pnfs_ld_read/write_done, but also calling mark_extents_written() inside the workqueue.
If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by just creating a normal kthread in error case), we would still have to create a workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a private workqueue, it no longer needs the normal kthread for read/write_done error handling, which leaves the kthread only specific to object layout (if it doesn't need a workqueue like blocklayout does).

So I think a generic pnfs workqueue is better because it simplifies things and solve both problems.

Best,
Tao

> 
> Trond


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  5:16           ` tao.peng
@ 2011-09-21  7:04             ` Benny Halevy
  2011-09-21 10:23               ` tao.peng
  0 siblings, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-21  7:04 UTC (permalink / raw)
  To: tao.peng; +Cc: Trond.Myklebust, linux-nfs, honey, rees

On 2011-09-21 08:16, tao.peng@emc.com wrote:
>> -----Original Message-----
>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
>> Sent: Wednesday, September 21, 2011 12:20 PM
>> To: Peng, Tao
>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
>> rees@umich.edu
>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>
>>> -----Original Message-----
>>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
>>> Sent: Tuesday, September 20, 2011 10:44 PM
>>> To: Myklebust, Trond
>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
>> honey@citi.umich.edu;
>>> rees@umich.edu
>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-nfs-owner@vger.kernel.org
>>>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>> On Behalf Of Jim Rees
>>>> Sent: Wednesday, September 21, 2011 8:29 AM
>>>> To: Trond Myklebust
>>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>
>>>> Trond Myklebust wrote:
>>>>
>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>>>>   > From: Peng Tao <bergwolf@gmail.com>
>>>>   >
>>>>   > For layoutdriver io done functions, default workqueue is not a
>> good
>>> place as
>>>>   > the code is executed in IO path. So add a pnfs private workqueue
>> to
>>> handle
>>>>   > them.
>>>>   >
>>>>   > Also change block and object layout code to make use of this
>> private
>>>>   > workqueue.
>>>>   >
>>>>
>>>>   Wait, what????
>>>>
>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>>>>   layoutdriver io_done functions?
>>> The current code uses the system_wq to handle io_done functions. If
>> any
>>> function allocates memory in system_wq and causes dirty writeback in
>> nfs
>>> path, the io_done function can never be called because it is after the
>> original
>>> function in the system_wq. And you said that the rpciod/nfsiod is not
>> the
>>> ideal place because of memory allocation. So I suggested pnfs private
>>> workqueue and Benny agreed on it.
>>
>> You appear to be optimizing for a corner case. Why?
> Current code uses system_wq for io_done and it can cause deadlock. So this is more than just an optimization.
> 
>>
>> IOW: why do we need a whole new workqueue for something which is
>> supposed to be extremely rare? Why can't we just use standard keventd or
>> allocate a perfectly normal thread (e.g. the state recovery thread)?
> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be invoked during memory reclaim. And I agree that a normal kthread can solve it.
> 
> However, the blocklayout write end_io function still needs a context where it can allocate memory to convert extent state (for every IO that touches new extents). If you look into the patch, we are not just using the pnfs private workqueue to handle pnfs_ld_read/write_done, but also calling mark_extents_written() inside the workqueue.

Can this memory be preallocated before the I/O for the common case?

Benny

> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by just creating a normal kthread in error case), we would still have to create a workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a private workqueue, it no longer needs the normal kthread for read/write_done error handling, which leaves the kthread only specific to object layout (if it doesn't need a workqueue like blocklayout does).
> 
> So I think a generic pnfs workqueue is better because it simplifies things and solve both problems.
> 
> Best,
> Tao
> 
>>
>> Trond
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21  7:04             ` Benny Halevy
@ 2011-09-21 10:23               ` tao.peng
  2011-09-21 10:38                 ` Boaz Harrosh
  2011-09-21 10:56                 ` Benny Halevy
  0 siblings, 2 replies; 29+ messages in thread
From: tao.peng @ 2011-09-21 10:23 UTC (permalink / raw)
  To: bhalevy; +Cc: Trond.Myklebust, linux-nfs, honey, rees

Hi, Benny,

> -----Original Message-----
> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org]
> On Behalf Of Benny Halevy
> Sent: Wednesday, September 21, 2011 3:05 PM
> To: Peng, Tao
> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
> honey@citi.umich.edu; rees@umich.edu
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> On 2011-09-21 08:16, tao.peng@emc.com wrote:
> >> -----Original Message-----
> >> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
> >> Sent: Wednesday, September 21, 2011 12:20 PM
> >> To: Peng, Tao
> >> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
> >> rees@umich.edu
> >> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>
> >>> -----Original Message-----
> >>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
> >>> Sent: Tuesday, September 20, 2011 10:44 PM
> >>> To: Myklebust, Trond
> >>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
> >> honey@citi.umich.edu;
> >>> rees@umich.edu
> >>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: linux-nfs-owner@vger.kernel.org
> >>>> [mailto:linux-nfs-owner@vger.kernel.org]
> >>>> On Behalf Of Jim Rees
> >>>> Sent: Wednesday, September 21, 2011 8:29 AM
> >>>> To: Trond Myklebust
> >>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> >>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>
> >>>> Trond Myklebust wrote:
> >>>>
> >>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> >>>>   > From: Peng Tao <bergwolf@gmail.com>
> >>>>   >
> >>>>   > For layoutdriver io done functions, default workqueue is not a
> >> good
> >>> place as
> >>>>   > the code is executed in IO path. So add a pnfs private workqueue
> >> to
> >>> handle
> >>>>   > them.
> >>>>   >
> >>>>   > Also change block and object layout code to make use of this
> >> private
> >>>>   > workqueue.
> >>>>   >
> >>>>
> >>>>   Wait, what????
> >>>>
> >>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> >>>>   layoutdriver io_done functions?
> >>> The current code uses the system_wq to handle io_done functions. If
> >> any
> >>> function allocates memory in system_wq and causes dirty writeback in
> >> nfs
> >>> path, the io_done function can never be called because it is after the
> >> original
> >>> function in the system_wq. And you said that the rpciod/nfsiod is not
> >> the
> >>> ideal place because of memory allocation. So I suggested pnfs private
> >>> workqueue and Benny agreed on it.
> >>
> >> You appear to be optimizing for a corner case. Why?
> > Current code uses system_wq for io_done and it can cause deadlock. So this is
> more than just an optimization.
> >
> >>
> >> IOW: why do we need a whole new workqueue for something which is
> >> supposed to be extremely rare? Why can't we just use standard keventd or
> >> allocate a perfectly normal thread (e.g. the state recovery thread)?
> > Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be
> invoked during memory reclaim. And I agree that a normal kthread can solve it.
> >
> > However, the blocklayout write end_io function still needs a context where it can
> allocate memory to convert extent state (for every IO that touches new extents). If
> you look into the patch, we are not just using the pnfs private workqueue to handle
> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
> workqueue.
> 
> Can this memory be preallocated before the I/O for the common case?
It is possible to preallocate memory for the common case. But it doesn't remove the need for workqueue. Without workqueue, we will need to put a bunch of extent manipulation code into bio->end_io, which is executed in bottom half and we always need minimize its execution.

Unless we do following:
1. preallocate memory for extent state convertion
2. use nfsiod/rpciod to handle bl_write_cleanup
3. for pnfs error case, create a kthread to recollapse and resend to MDS

not sure if it worth the complexity though...

Cheers,
Tao

> 
> Benny
> 
> > If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by
> just creating a normal kthread in error case), we would still have to create a
> workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a
> private workqueue, it no longer needs the normal kthread for read/write_done error
> handling, which leaves the kthread only specific to object layout (if it doesn't need a
> workqueue like blocklayout does).
> >
> > So I think a generic pnfs workqueue is better because it simplifies things and solve
> both problems.
> >
> > Best,
> > Tao
> >
> >>
> >> Trond
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 10:23               ` tao.peng
@ 2011-09-21 10:38                 ` Boaz Harrosh
  2011-09-21 11:04                   ` tao.peng
  2011-09-21 10:56                 ` Benny Halevy
  1 sibling, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2011-09-21 10:38 UTC (permalink / raw)
  To: tao.peng; +Cc: bhalevy, Trond.Myklebust, linux-nfs, honey, rees

On 09/21/2011 01:23 PM, tao.peng@emc.com wrote:
> 
> Unless we do following:
> 1. preallocate memory for extent state convertion
> 2. use nfsiod/rpciod to handle bl_write_cleanup
> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> 
> not sure if it worth the complexity though...
> 
> Cheers,
> Tao
> 

Please forgive my ignorance but what is the big difference between
a kthread and a workqueue? I thought a workqueue is just a kthread
and some memory structures (list).

I agree with "keep things simple"

But what do I know ;-)

Thanks
Boaz

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 10:23               ` tao.peng
  2011-09-21 10:38                 ` Boaz Harrosh
@ 2011-09-21 10:56                 ` Benny Halevy
  2011-09-21 11:10                   ` tao.peng
  1 sibling, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-21 10:56 UTC (permalink / raw)
  To: tao.peng; +Cc: Trond.Myklebust, linux-nfs, honey, rees

On 2011-09-21 13:23, tao.peng@emc.com wrote:
> Hi, Benny,
> 
>> -----Original Message-----
>> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org]
>> On Behalf Of Benny Halevy
>> Sent: Wednesday, September 21, 2011 3:05 PM
>> To: Peng, Tao
>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
>> honey@citi.umich.edu; rees@umich.edu
>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>
>> On 2011-09-21 08:16, tao.peng@emc.com wrote:
>>>> -----Original Message-----
>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
>>>> Sent: Wednesday, September 21, 2011 12:20 PM
>>>> To: Peng, Tao
>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
>>>> rees@umich.edu
>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>
>>>>> -----Original Message-----
>>>>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
>>>>> Sent: Tuesday, September 20, 2011 10:44 PM
>>>>> To: Myklebust, Trond
>>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
>>>> honey@citi.umich.edu;
>>>>> rees@umich.edu
>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: linux-nfs-owner@vger.kernel.org
>>>>>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>>>> On Behalf Of Jim Rees
>>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
>>>>>> To: Trond Myklebust
>>>>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
>>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>
>>>>>> Trond Myklebust wrote:
>>>>>>
>>>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>>>>>>   > From: Peng Tao <bergwolf@gmail.com>
>>>>>>   >
>>>>>>   > For layoutdriver io done functions, default workqueue is not a
>>>> good
>>>>> place as
>>>>>>   > the code is executed in IO path. So add a pnfs private workqueue
>>>> to
>>>>> handle
>>>>>>   > them.
>>>>>>   >
>>>>>>   > Also change block and object layout code to make use of this
>>>> private
>>>>>>   > workqueue.
>>>>>>   >
>>>>>>
>>>>>>   Wait, what????
>>>>>>
>>>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>>>>>>   layoutdriver io_done functions?
>>>>> The current code uses the system_wq to handle io_done functions. If
>>>> any
>>>>> function allocates memory in system_wq and causes dirty writeback in
>>>> nfs
>>>>> path, the io_done function can never be called because it is after the
>>>> original
>>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
>>>> the
>>>>> ideal place because of memory allocation. So I suggested pnfs private
>>>>> workqueue and Benny agreed on it.
>>>>
>>>> You appear to be optimizing for a corner case. Why?
>>> Current code uses system_wq for io_done and it can cause deadlock. So this is
>> more than just an optimization.
>>>
>>>>
>>>> IOW: why do we need a whole new workqueue for something which is
>>>> supposed to be extremely rare? Why can't we just use standard keventd or
>>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
>>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
>> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can be
>> invoked during memory reclaim. And I agree that a normal kthread can solve it.
>>>
>>> However, the blocklayout write end_io function still needs a context where it can
>> allocate memory to convert extent state (for every IO that touches new extents). If
>> you look into the patch, we are not just using the pnfs private workqueue to handle
>> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
>> workqueue.
>>
>> Can this memory be preallocated before the I/O for the common case?
> It is possible to preallocate memory for the common case. But it doesn't remove the need for workqueue. Without workqueue, we will need to put a bunch of extent manipulation code into bio->end_io, which is executed in bottom half and we always need minimize its execution.
> 
> Unless we do following:
> 1. preallocate memory for extent state convertion
> 2. use nfsiod/rpciod to handle bl_write_cleanup
> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> 
> not sure if it worth the complexity though...

We do want to avoid memory allocation on the completion path to ensure
completion under memory pressure and graceful handling of low memory conditions
so I think this approach is worth while.

Having a lightweight handler on the bottom half done path is ok too.

Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
kthread makes a sense, just it mustn't block.  Could you use the nfs state manager
for that?

Benny

> 
> Cheers,
> Tao
> 
>>
>> Benny
>>
>>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e., by
>> just creating a normal kthread in error case), we would still have to create a
>> workqueue inside blocklayout driver to handle write end_io. And if blocklayout has a
>> private workqueue, it no longer needs the normal kthread for read/write_done error
>> handling, which leaves the kthread only specific to object layout (if it doesn't need a
>> workqueue like blocklayout does).
>>>
>>> So I think a generic pnfs workqueue is better because it simplifies things and solve
>> both problems.
>>>
>>> Best,
>>> Tao
>>>
>>>>
>>>> Trond
>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 10:38                 ` Boaz Harrosh
@ 2011-09-21 11:04                   ` tao.peng
  0 siblings, 0 replies; 29+ messages in thread
From: tao.peng @ 2011-09-21 11:04 UTC (permalink / raw)
  To: bharrosh; +Cc: bhalevy, Trond.Myklebust, linux-nfs, honey, rees

SGksIEJvYXosDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IGxpbnV4LW5m
cy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1uZnMtb3duZXJAdmdlci5rZXJu
ZWwub3JnXQ0KPiBPbiBCZWhhbGYgT2YgQm9heiBIYXJyb3NoDQo+IFNlbnQ6IFdlZG5lc2RheSwg
U2VwdGVtYmVyIDIxLCAyMDExIDY6MzggUE0NCj4gVG86IFBlbmcsIFRhbw0KPiBDYzogYmhhbGV2
eUB0b25pYW4uY29tOyBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbTsgbGludXgtbmZzQHZnZXIu
a2VybmVsLm9yZzsNCj4gaG9uZXlAY2l0aS51bWljaC5lZHU7IHJlZXNAdW1pY2guZWR1DQo+IFN1
YmplY3Q6IFJlOiBbUEFUQ0ggMi8zXSBwbmZzOiBpbnRyb2R1Y2UgcG5mcyBwcml2YXRlIHdvcmtx
dWV1ZQ0KPiANCj4gT24gMDkvMjEvMjAxMSAwMToyMyBQTSwgdGFvLnBlbmdAZW1jLmNvbSB3cm90
ZToNCj4gPg0KPiA+IFVubGVzcyB3ZSBkbyBmb2xsb3dpbmc6DQo+ID4gMS4gcHJlYWxsb2NhdGUg
bWVtb3J5IGZvciBleHRlbnQgc3RhdGUgY29udmVydGlvbg0KPiA+IDIuIHVzZSBuZnNpb2QvcnBj
aW9kIHRvIGhhbmRsZSBibF93cml0ZV9jbGVhbnVwDQo+ID4gMy4gZm9yIHBuZnMgZXJyb3IgY2Fz
ZSwgY3JlYXRlIGEga3RocmVhZCB0byByZWNvbGxhcHNlIGFuZCByZXNlbmQgdG8gTURTDQo+ID4N
Cj4gPiBub3Qgc3VyZSBpZiBpdCB3b3J0aCB0aGUgY29tcGxleGl0eSB0aG91Z2guLi4NCj4gPg0K
PiA+IENoZWVycywNCj4gPiBUYW8NCj4gPg0KPiANCj4gUGxlYXNlIGZvcmdpdmUgbXkgaWdub3Jh
bmNlIGJ1dCB3aGF0IGlzIHRoZSBiaWcgZGlmZmVyZW5jZSBiZXR3ZWVuDQo+IGEga3RocmVhZCBh
bmQgYSB3b3JrcXVldWU/IEkgdGhvdWdodCBhIHdvcmtxdWV1ZSBpcyBqdXN0IGEga3RocmVhZA0K
PiBhbmQgc29tZSBtZW1vcnkgc3RydWN0dXJlcyAobGlzdCkuDQpTb3JyeSBmb3IgYmVpbmcgbm90
IGNsZWFyLi4uDQpCeSBrdGhyZWFkLCBJIG1lYW4gYSB0aHJlYWQgdGhhdCBydW5zIG9ubHkgb25j
ZSBhbmQgZG9lcyB3aGF0IHdlIG5lZWQgYW5kIHRoZW4gZXhpdHMsIGlmIEkgcmVhZCBjb2RlIGNv
cnJlY3RseSwganVzdCBsaWtlIHRoZSBzdGF0ZSByZWNvdmVyeSB0aHJlYWQgdGhhdCBUcm9uZCBt
ZW50aW9uZWQgYWJvdmUuIEFuZCBhcyBUcm9uZCBleHBsYWluZWQsIHNpbmNlIHRoZSBlcnJvciBj
YXNlIHNob3VsZCBiZSByYXJlLCBpdCBpcyBub3QgbmVjZXNzYXJ5IHRvIGtlZXAgYSB0aHJlYWQg
dXAgYW5kIHdhaXRpbmcgZm9yIGl0IGFsbCB0aGUgdGltZS4NCkFuZCBieSB3b3JrcXVldWUsIEkg
dGhpbmsgeW91IGtub3cgZXhhY3RseSB3aGF0IGl0IGlzIDopDQoNCkNoZWVycywNClRhbw0KPiAN
Cj4gSSBhZ3JlZSB3aXRoICJrZWVwIHRoaW5ncyBzaW1wbGUiDQo+IA0KPiBCdXQgd2hhdCBkbyBJ
IGtub3cgOy0pDQo+IA0KPiBUaGFua3MNCj4gQm9heg0KPiAtLQ0KPiBUbyB1bnN1YnNjcmliZSBm
cm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0K
PiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiBN
b3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1p
bmZvLmh0bWwNCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+977+977+9Yu+/vVjvv73vv73H
p3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73vv70i77+977+9Xm7vv71y77+9
77+977+9eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7vv71H77+977+977+9aO+/vQMo
77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/vXrvv73elu+/ve+/ve+/vWbv
v73vv73vv71o77+977+977+9fu+/vW3vv70=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 10:56                 ` Benny Halevy
@ 2011-09-21 11:10                   ` tao.peng
  2011-09-21 11:27                     ` Benny Halevy
  0 siblings, 1 reply; 29+ messages in thread
From: tao.peng @ 2011-09-21 11:10 UTC (permalink / raw)
  To: bhalevy; +Cc: Trond.Myklebust, linux-nfs, honey, rees

> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy@tonian.com]
> Sent: Wednesday, September 21, 2011 6:57 PM
> To: Peng, Tao
> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
> honey@citi.umich.edu; rees@umich.edu
> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> 
> On 2011-09-21 13:23, tao.peng@emc.com wrote:
> > Hi, Benny,
> >
> >> -----Original Message-----
> >> From: linux-nfs-owner@vger.kernel.org
> [mailto:linux-nfs-owner@vger.kernel.org]
> >> On Behalf Of Benny Halevy
> >> Sent: Wednesday, September 21, 2011 3:05 PM
> >> To: Peng, Tao
> >> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
> >> honey@citi.umich.edu; rees@umich.edu
> >> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>
> >> On 2011-09-21 08:16, tao.peng@emc.com wrote:
> >>>> -----Original Message-----
> >>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
> >>>> Sent: Wednesday, September 21, 2011 12:20 PM
> >>>> To: Peng, Tao
> >>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
> >>>> rees@umich.edu
> >>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
> >>>>> Sent: Tuesday, September 20, 2011 10:44 PM
> >>>>> To: Myklebust, Trond
> >>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
> >>>> honey@citi.umich.edu;
> >>>>> rees@umich.edu
> >>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: linux-nfs-owner@vger.kernel.org
> >>>>>> [mailto:linux-nfs-owner@vger.kernel.org]
> >>>>>> On Behalf Of Jim Rees
> >>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
> >>>>>> To: Trond Myklebust
> >>>>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
> >>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
> >>>>>>
> >>>>>> Trond Myklebust wrote:
> >>>>>>
> >>>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
> >>>>>>   > From: Peng Tao <bergwolf@gmail.com>
> >>>>>>   >
> >>>>>>   > For layoutdriver io done functions, default workqueue is not a
> >>>> good
> >>>>> place as
> >>>>>>   > the code is executed in IO path. So add a pnfs private workqueue
> >>>> to
> >>>>> handle
> >>>>>>   > them.
> >>>>>>   >
> >>>>>>   > Also change block and object layout code to make use of this
> >>>> private
> >>>>>>   > workqueue.
> >>>>>>   >
> >>>>>>
> >>>>>>   Wait, what????
> >>>>>>
> >>>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
> >>>>>>   layoutdriver io_done functions?
> >>>>> The current code uses the system_wq to handle io_done functions. If
> >>>> any
> >>>>> function allocates memory in system_wq and causes dirty writeback in
> >>>> nfs
> >>>>> path, the io_done function can never be called because it is after the
> >>>> original
> >>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
> >>>> the
> >>>>> ideal place because of memory allocation. So I suggested pnfs private
> >>>>> workqueue and Benny agreed on it.
> >>>>
> >>>> You appear to be optimizing for a corner case. Why?
> >>> Current code uses system_wq for io_done and it can cause deadlock. So this is
> >> more than just an optimization.
> >>>
> >>>>
> >>>> IOW: why do we need a whole new workqueue for something which is
> >>>> supposed to be extremely rare? Why can't we just use standard keventd or
> >>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
> >>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
> >> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can
> be
> >> invoked during memory reclaim. And I agree that a normal kthread can solve it.
> >>>
> >>> However, the blocklayout write end_io function still needs a context where it
> can
> >> allocate memory to convert extent state (for every IO that touches new extents).
> If
> >> you look into the patch, we are not just using the pnfs private workqueue to
> handle
> >> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
> >> workqueue.
> >>
> >> Can this memory be preallocated before the I/O for the common case?
> > It is possible to preallocate memory for the common case. But it doesn't remove
> the need for workqueue. Without workqueue, we will need to put a bunch of extent
> manipulation code into bio->end_io, which is executed in bottom half and we
> always need minimize its execution.
> >
> > Unless we do following:
> > 1. preallocate memory for extent state convertion
> > 2. use nfsiod/rpciod to handle bl_write_cleanup
> > 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> >
> > not sure if it worth the complexity though...
> 
> We do want to avoid memory allocation on the completion path to ensure
> completion under memory pressure and graceful handling of low memory
> conditions
> so I think this approach is worth while.
> 
> Having a lightweight handler on the bottom half done path is ok too.
> 
> Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
> kthread makes a sense, just it mustn't block.  Could you use the nfs state manager
> for that?
I don't quite understand. How do you use nfs state manager to do other tasks?

Thanks,
Tao

> 
> Benny
> 
> >
> > Cheers,
> > Tao
> >
> >>
> >> Benny
> >>
> >>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e.,
> by
> >> just creating a normal kthread in error case), we would still have to create a
> >> workqueue inside blocklayout driver to handle write end_io. And if blocklayout
> has a
> >> private workqueue, it no longer needs the normal kthread for read/write_done
> error
> >> handling, which leaves the kthread only specific to object layout (if it doesn't
> need a
> >> workqueue like blocklayout does).
> >>>
> >>> So I think a generic pnfs workqueue is better because it simplifies things and
> solve
> >> both problems.
> >>>
> >>> Best,
> >>> Tao
> >>>
> >>>>
> >>>> Trond
> >>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 11:10                   ` tao.peng
@ 2011-09-21 11:27                     ` Benny Halevy
  2011-09-21 11:42                       ` Boaz Harrosh
  0 siblings, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-21 11:27 UTC (permalink / raw)
  To: tao.peng; +Cc: Trond.Myklebust, linux-nfs, honey, rees

On 2011-09-21 14:10, tao.peng@emc.com wrote:
>> -----Original Message-----
>> From: Benny Halevy [mailto:bhalevy@tonian.com]
>> Sent: Wednesday, September 21, 2011 6:57 PM
>> To: Peng, Tao
>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
>> honey@citi.umich.edu; rees@umich.edu
>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>
>> On 2011-09-21 13:23, tao.peng@emc.com wrote:
>>> Hi, Benny,
>>>
>>>> -----Original Message-----
>>>> From: linux-nfs-owner@vger.kernel.org
>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>> On Behalf Of Benny Halevy
>>>> Sent: Wednesday, September 21, 2011 3:05 PM
>>>> To: Peng, Tao
>>>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
>>>> honey@citi.umich.edu; rees@umich.edu
>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>
>>>> On 2011-09-21 08:16, tao.peng@emc.com wrote:
>>>>>> -----Original Message-----
>>>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
>>>>>> Sent: Wednesday, September 21, 2011 12:20 PM
>>>>>> To: Peng, Tao
>>>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
>>>>>> rees@umich.edu
>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
>>>>>>> Sent: Tuesday, September 20, 2011 10:44 PM
>>>>>>> To: Myklebust, Trond
>>>>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
>>>>>> honey@citi.umich.edu;
>>>>>>> rees@umich.edu
>>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: linux-nfs-owner@vger.kernel.org
>>>>>>>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>>>>>> On Behalf Of Jim Rees
>>>>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
>>>>>>>> To: Trond Myklebust
>>>>>>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
>>>>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>>
>>>>>>>> Trond Myklebust wrote:
>>>>>>>>
>>>>>>>>   On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>>>>>>>>   > From: Peng Tao <bergwolf@gmail.com>
>>>>>>>>   >
>>>>>>>>   > For layoutdriver io done functions, default workqueue is not a
>>>>>> good
>>>>>>> place as
>>>>>>>>   > the code is executed in IO path. So add a pnfs private workqueue
>>>>>> to
>>>>>>> handle
>>>>>>>>   > them.
>>>>>>>>   >
>>>>>>>>   > Also change block and object layout code to make use of this
>>>>>> private
>>>>>>>>   > workqueue.
>>>>>>>>   >
>>>>>>>>
>>>>>>>>   Wait, what????
>>>>>>>>
>>>>>>>>   Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>>>>>>>>   layoutdriver io_done functions?
>>>>>>> The current code uses the system_wq to handle io_done functions. If
>>>>>> any
>>>>>>> function allocates memory in system_wq and causes dirty writeback in
>>>>>> nfs
>>>>>>> path, the io_done function can never be called because it is after the
>>>>>> original
>>>>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
>>>>>> the
>>>>>>> ideal place because of memory allocation. So I suggested pnfs private
>>>>>>> workqueue and Benny agreed on it.
>>>>>>
>>>>>> You appear to be optimizing for a corner case. Why?
>>>>> Current code uses system_wq for io_done and it can cause deadlock. So this is
>>>> more than just an optimization.
>>>>>
>>>>>>
>>>>>> IOW: why do we need a whole new workqueue for something which is
>>>>>> supposed to be extremely rare? Why can't we just use standard keventd or
>>>>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
>>>>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
>>>> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can
>> be
>>>> invoked during memory reclaim. And I agree that a normal kthread can solve it.
>>>>>
>>>>> However, the blocklayout write end_io function still needs a context where it
>> can
>>>> allocate memory to convert extent state (for every IO that touches new extents).
>> If
>>>> you look into the patch, we are not just using the pnfs private workqueue to
>> handle
>>>> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
>>>> workqueue.
>>>>
>>>> Can this memory be preallocated before the I/O for the common case?
>>> It is possible to preallocate memory for the common case. But it doesn't remove
>> the need for workqueue. Without workqueue, we will need to put a bunch of extent
>> manipulation code into bio->end_io, which is executed in bottom half and we
>> always need minimize its execution.
>>>
>>> Unless we do following:
>>> 1. preallocate memory for extent state convertion
>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>
>>> not sure if it worth the complexity though...
>>
>> We do want to avoid memory allocation on the completion path to ensure
>> completion under memory pressure and graceful handling of low memory
>> conditions
>> so I think this approach is worth while.
>>
>> Having a lightweight handler on the bottom half done path is ok too.
>>
>> Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
>> kthread makes a sense, just it mustn't block.  Could you use the nfs state manager
>> for that?
> I don't quite understand. How do you use nfs state manager to do other tasks?

You need to keep a list of things to do hanging off of the nfs client structure
and set a bit in cl_state telling the state manager it has work to do
and wake it up.  It then needs to go over the list of, say nfs_inodes
and call into the layout driver to handle the errors.

Benny

> 
> Thanks,
> Tao
> 
>>
>> Benny
>>
>>>
>>> Cheers,
>>> Tao
>>>
>>>>
>>>> Benny
>>>>
>>>>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e.,
>> by
>>>> just creating a normal kthread in error case), we would still have to create a
>>>> workqueue inside blocklayout driver to handle write end_io. And if blocklayout
>> has a
>>>> private workqueue, it no longer needs the normal kthread for read/write_done
>> error
>>>> handling, which leaves the kthread only specific to object layout (if it doesn't
>> need a
>>>> workqueue like blocklayout does).
>>>>>
>>>>> So I think a generic pnfs workqueue is better because it simplifies things and
>> solve
>>>> both problems.
>>>>>
>>>>> Best,
>>>>> Tao
>>>>>
>>>>>>
>>>>>> Trond
>>>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 11:27                     ` Benny Halevy
@ 2011-09-21 11:42                       ` Boaz Harrosh
  2011-09-21 11:50                         ` Benny Halevy
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2011-09-21 11:42 UTC (permalink / raw)
  To: Benny Halevy; +Cc: tao.peng, Trond.Myklebust, linux-nfs, honey, rees

On 09/21/2011 02:27 PM, Benny Halevy wrote:
>> Unless we do following:
>> 1. preallocate memory for extent state convertion
>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>> I don't quite understand. How do you use nfs state manager to do other tasks?
> 
> You need to keep a list of things to do hanging off of the nfs client structure
> and set a bit in cl_state telling the state manager it has work to do
> and wake it up.  It then needs to go over the list of, say nfs_inodes
> and call into the layout driver to handle the errors.
> 
> Benny

Good god, Is it not already too complicated?

The LD is out of the picture. You all seemed to agree that
the LD has reported an io_done on the nfsiod/rpciod, and in the error case
Generic layer needs to do it's coalescing on some other thread. So
your description above is not correct, the LD is out of the picture.

It all looks too complicated for me. A pnfs workqueue for both 2 and 3
above is very good. Specially since the workqueue also shares global
pool threads, No? I like it that there is a preallocated thread for
the error-case, think about it.

Thanks
Boaz

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 11:42                       ` Boaz Harrosh
@ 2011-09-21 11:50                         ` Benny Halevy
  2011-09-21 13:56                           ` Boaz Harrosh
  0 siblings, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-21 11:50 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: tao.peng, Trond.Myklebust, linux-nfs, honey, rees

On 2011-09-21 14:42, Boaz Harrosh wrote:
> On 09/21/2011 02:27 PM, Benny Halevy wrote:
>>> Unless we do following:
>>> 1. preallocate memory for extent state convertion
>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>> I don't quite understand. How do you use nfs state manager to do other tasks?
>>
>> You need to keep a list of things to do hanging off of the nfs client structure
>> and set a bit in cl_state telling the state manager it has work to do
>> and wake it up.  It then needs to go over the list of, say nfs_inodes
>> and call into the layout driver to handle the errors.
>>
>> Benny
> 
> Good god, Is it not already too complicated?
> 
> The LD is out of the picture. You all seemed to agree that
> the LD has reported an io_done on the nfsiod/rpciod, and in the error case
> Generic layer needs to do it's coalescing on some other thread. So
> your description above is not correct, the LD is out of the picture.
> 

True, if the ld cleanup on io_done is sufficient.

> It all looks too complicated for me. A pnfs workqueue for both 2 and 3
> above is very good. Specially since the workqueue also shares global
> pool threads, No? I like it that there is a preallocated thread for
> the error-case, think about it.

I'm fine too with using a workqueue for the error case.
But I'd rather have the common case done path do only lightweight,
wait free processing.

Benny

> 
> Thanks
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] replacement for "introduce pnfs private workqueue"
  2011-09-20  3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
                   ` (2 preceding siblings ...)
  2011-09-20  3:18 ` [PATCH 3/3] SQUASHME: pnfs: simplify and clean up pnfsiod workqueue Jim Rees
@ 2011-09-21 11:52 ` Benny Halevy
  2011-09-21 12:32   ` Jim Rees
  3 siblings, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-21 11:52 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs, peter honeyman

I've submitted this version to the public git tree
(pnfs-all-3.1-rc4-2011-09-20)
and rebased it onto v3.1-rc6 plus Tronds bugfixes for -rc7.
(pnfs-all-3.1-rc6-2011-09-20)

Benny

On 2011-09-20 06:18, Jim Rees wrote:
> The first patch is a bug fix that has already been sent upstream.  The next
> two patches should replace the following in your tree:
> 
> 0359f0e pNFS: introduce pnfs private workqueue
> d9b5876 SQUASHME: pnfs: simplify and clean up pnfsiod workqueue
> ef9e1a70 SQUASHME: pnfs: do pnfsiod_start before registering layout drivers
> 
> This will re-order the patches so that the bug fix is first, followed by the
> workqueue patches.
> 
> These are also available on the for-benny branch of my git repo.
> 
> Benny Halevy (1):
>   SQUASHME: pnfs: simplify and clean up pnfsiod workqueue
> 
> Peng Tao (2):
>   pnfsblock: add missing rpc_put_mount and path_put
>   pnfs: introduce pnfs private workqueue
> 
>  fs/nfs/blocklayout/blocklayout.c |   21 +++++++++++---
>  fs/nfs/objlayout/objio_osd.c     |   10 ++++++-
>  fs/nfs/objlayout/objlayout.c     |    4 +-
>  fs/nfs/pnfs.c                    |   54 +++++++++++++++++++++++++++++++++++++-
>  fs/nfs/pnfs.h                    |    4 +++
>  5 files changed, 84 insertions(+), 9 deletions(-)
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] replacement for "introduce pnfs private workqueue"
  2011-09-21 11:52 ` [PATCH 0/3] replacement for "introduce pnfs private workqueue" Benny Halevy
@ 2011-09-21 12:32   ` Jim Rees
  0 siblings, 0 replies; 29+ messages in thread
From: Jim Rees @ 2011-09-21 12:32 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

Benny Halevy wrote:

  I've submitted this version to the public git tree
  (pnfs-all-3.1-rc4-2011-09-20)
  and rebased it onto v3.1-rc6 plus Tronds bugfixes for -rc7.
  (pnfs-all-3.1-rc6-2011-09-20)

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 11:50                         ` Benny Halevy
@ 2011-09-21 13:56                           ` Boaz Harrosh
  2011-09-21 15:45                             ` Peng Tao
  0 siblings, 1 reply; 29+ messages in thread
From: Boaz Harrosh @ 2011-09-21 13:56 UTC (permalink / raw)
  To: Benny Halevy; +Cc: tao.peng, Trond.Myklebust, linux-nfs, honey, rees

On 09/21/2011 02:50 PM, Benny Halevy wrote:
> On 2011-09-21 14:42, Boaz Harrosh wrote:
>> On 09/21/2011 02:27 PM, Benny Halevy wrote:
>>>> Unless we do following:
>>>> 1. preallocate memory for extent state convertion
>>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>> I don't quite understand. How do you use nfs state manager to do other tasks?
>>>
>>> You need to keep a list of things to do hanging off of the nfs client structure
>>> and set a bit in cl_state telling the state manager it has work to do
>>> and wake it up.  It then needs to go over the list of, say nfs_inodes
>>> and call into the layout driver to handle the errors.
>>>
>>> Benny
>>
>> Good god, Is it not already too complicated?
>>
>> The LD is out of the picture. You all seemed to agree that
>> the LD has reported an io_done on the nfsiod/rpciod, and in the error case
>> Generic layer needs to do it's coalescing on some other thread. So
>> your description above is not correct, the LD is out of the picture.
>>
> 
> True, if the ld cleanup on io_done is sufficient.
> 
>> It all looks too complicated for me. A pnfs workqueue for both 2 and 3
>> above is very good. Specially since the workqueue also shares global
>> pool threads, No? I like it that there is a preallocated thread for
>> the error-case, think about it.
> 
> I'm fine too with using a workqueue for the error case.
> But I'd rather have the common case done path do only lightweight,
> wait free processing.
> 
> Benny
> 

If by "common case done path do only lightweight" you mean
"preallocate memory for extent state conversion". Then I absolutely
agree. But as far as workqueue/kthread then nfsiod/rpciod-wq or
pnfs-wq is exactly the same for the "common case". Unless I'm
totally missing the point. What are you saying?

These are the options so far:

[Toe's option which he rather not]
1. preallocate memory for extent state conversion
2. use nfsiod/rpciod to handle bl_write_cleanup
3. for pnfs error case, create a kthread to recollapse and resend to MDS

[My option which I think Toe agrees with]
1. preallocate memory for extent state conversion
2. use pnfs-wq to handle bl_write_cleanup
3. pnfs error case, just like Toe's patches as part of io_done
   on pnfs-wq

Any other options?
Boaz

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 13:56                           ` Boaz Harrosh
@ 2011-09-21 15:45                             ` Peng Tao
  2011-09-21 16:03                               ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: Peng Tao @ 2011-09-21 15:45 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Benny Halevy, tao.peng, Trond.Myklebust, linux-nfs, honey, rees

On Wed, Sep 21, 2011 at 9:56 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On 09/21/2011 02:50 PM, Benny Halevy wrote:
>> On 2011-09-21 14:42, Boaz Harrosh wrote:
>>> On 09/21/2011 02:27 PM, Benny Halevy wrote:
>>>>> Unless we do following:
>>>>> 1. preallocate memory for extent state convertion
>>>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>>> I don't quite understand. How do you use nfs state manager to do other tasks?
>>>>
>>>> You need to keep a list of things to do hanging off of the nfs client structure
>>>> and set a bit in cl_state telling the state manager it has work to do
>>>> and wake it up.  It then needs to go over the list of, say nfs_inodes
>>>> and call into the layout driver to handle the errors.
>>>>
>>>> Benny
>>>
>>> Good god, Is it not already too complicated?
>>>
>>> The LD is out of the picture. You all seemed to agree that
>>> the LD has reported an io_done on the nfsiod/rpciod, and in the error case
>>> Generic layer needs to do it's coalescing on some other thread. So
>>> your description above is not correct, the LD is out of the picture.
>>>
>>
>> True, if the ld cleanup on io_done is sufficient.
>>
>>> It all looks too complicated for me. A pnfs workqueue for both 2 and 3
>>> above is very good. Specially since the workqueue also shares global
>>> pool threads, No? I like it that there is a preallocated thread for
>>> the error-case, think about it.
>>
>> I'm fine too with using a workqueue for the error case.
>> But I'd rather have the common case done path do only lightweight,
>> wait free processing.
>>
>> Benny
>>
>
> If by "common case done path do only lightweight" you mean
> "preallocate memory for extent state conversion". Then I absolutely
> agree. But as far as workqueue/kthread then nfsiod/rpciod-wq or
> pnfs-wq is exactly the same for the "common case". Unless I'm
> totally missing the point. What are you saying?
>
> These are the options so far:
>
> [Toe's option which he rather not]
> 1. preallocate memory for extent state conversion
> 2. use nfsiod/rpciod to handle bl_write_cleanup
> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>
> [My option which I think Toe agrees with]
> 1. preallocate memory for extent state conversion
> 2. use pnfs-wq to handle bl_write_cleanup
> 3. pnfs error case, just like Toe's patches as part of io_done
>   on pnfs-wq
Yeah, I would vote for this one because of its simplicity. ;-)

Thanks,
Tao
>
> Any other options?
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Thanks,
-Bergwolf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 15:45                             ` Peng Tao
@ 2011-09-21 16:03                               ` Trond Myklebust
  2011-09-22  3:30                                 ` tao.peng
  2011-09-22  7:17                                 ` Boaz Harrosh
  0 siblings, 2 replies; 29+ messages in thread
From: Trond Myklebust @ 2011-09-21 16:03 UTC (permalink / raw)
  To: Peng Tao; +Cc: Boaz Harrosh, Benny Halevy, tao.peng, linux-nfs, honey, rees

On Wed, 2011-09-21 at 23:45 +0800, Peng Tao wrote: 
> On Wed, Sep 21, 2011 at 9:56 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
> > On 09/21/2011 02:50 PM, Benny Halevy wrote:
> >> On 2011-09-21 14:42, Boaz Harrosh wrote:
> >>> On 09/21/2011 02:27 PM, Benny Halevy wrote:
> >>>>> Unless we do following:
> >>>>> 1. preallocate memory for extent state convertion
> >>>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
> >>>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> >>>>> I don't quite understand. How do you use nfs state manager to do other tasks?
> >>>>
> >>>> You need to keep a list of things to do hanging off of the nfs client structure
> >>>> and set a bit in cl_state telling the state manager it has work to do
> >>>> and wake it up.  It then needs to go over the list of, say nfs_inodes
> >>>> and call into the layout driver to handle the errors.
> >>>>
> >>>> Benny
> >>>
> >>> Good god, Is it not already too complicated?
> >>>
> >>> The LD is out of the picture. You all seemed to agree that
> >>> the LD has reported an io_done on the nfsiod/rpciod, and in the error case
> >>> Generic layer needs to do it's coalescing on some other thread. So
> >>> your description above is not correct, the LD is out of the picture.
> >>>
> >>
> >> True, if the ld cleanup on io_done is sufficient.
> >>
> >>> It all looks too complicated for me. A pnfs workqueue for both 2 and 3
> >>> above is very good. Specially since the workqueue also shares global
> >>> pool threads, No? I like it that there is a preallocated thread for
> >>> the error-case, think about it.
> >>
> >> I'm fine too with using a workqueue for the error case.
> >> But I'd rather have the common case done path do only lightweight,
> >> wait free processing.
> >>
> >> Benny
> >>
> >
> > If by "common case done path do only lightweight" you mean
> > "preallocate memory for extent state conversion". Then I absolutely
> > agree. But as far as workqueue/kthread then nfsiod/rpciod-wq or
> > pnfs-wq is exactly the same for the "common case". Unless I'm
> > totally missing the point. What are you saying?
> >
> > These are the options so far:
> >
> > [Toe's option which he rather not]
> > 1. preallocate memory for extent state conversion
> > 2. use nfsiod/rpciod to handle bl_write_cleanup
> > 3. for pnfs error case, create a kthread to recollapse and resend to MDS
> >
> > [My option which I think Toe agrees with]
> > 1. preallocate memory for extent state conversion
> > 2. use pnfs-wq to handle bl_write_cleanup
> > 3. pnfs error case, just like Toe's patches as part of io_done
> >   on pnfs-wq
> Yeah, I would vote for this one because of its simplicity. ;-)

Sigh... The problem is that it completely fails to address the problem.

What's the difference between having pNFS completions run on nfsiod or
their own work queue? You'd be running i/o and allocations on the same
queue in both cases.

Cheers
Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 16:03                               ` Trond Myklebust
@ 2011-09-22  3:30                                 ` tao.peng
  2011-09-22  7:17                                 ` Boaz Harrosh
  1 sibling, 0 replies; 29+ messages in thread
From: tao.peng @ 2011-09-22  3:30 UTC (permalink / raw)
  To: Trond.Myklebust; +Cc: bharrosh, bhalevy, linux-nfs, honey, rees

SGksIFRyb25kLA0KDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IGxpbnV4
LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1uZnMtb3duZXJAdmdlci5r
ZXJuZWwub3JnXQ0KPiBPbiBCZWhhbGYgT2YgVHJvbmQgTXlrbGVidXN0DQo+IFNlbnQ6IFRodXJz
ZGF5LCBTZXB0ZW1iZXIgMjIsIDIwMTEgMTI6MDQgQU0NCj4gVG86IFBlbmcgVGFvDQo+IENjOiBC
b2F6IEhhcnJvc2g7IEJlbm55IEhhbGV2eTsgUGVuZywgVGFvOyBsaW51eC1uZnNAdmdlci5rZXJu
ZWwub3JnOw0KPiBob25leUBjaXRpLnVtaWNoLmVkdTsgcmVlc0B1bWljaC5lZHUNCj4gU3ViamVj
dDogUmU6IFtQQVRDSCAyLzNdIHBuZnM6IGludHJvZHVjZSBwbmZzIHByaXZhdGUgd29ya3F1ZXVl
DQo+IA0KPiBPbiBXZWQsIDIwMTEtMDktMjEgYXQgMjM6NDUgKzA4MDAsIFBlbmcgVGFvIHdyb3Rl
Og0KPiA+IE9uIFdlZCwgU2VwIDIxLCAyMDExIGF0IDk6NTYgUE0sIEJvYXogSGFycm9zaCA8Ymhh
cnJvc2hAcGFuYXNhcy5jb20+IHdyb3RlOg0KPiA+ID4gT24gMDkvMjEvMjAxMSAwMjo1MCBQTSwg
QmVubnkgSGFsZXZ5IHdyb3RlOg0KPiA+ID4+IE9uIDIwMTEtMDktMjEgMTQ6NDIsIEJvYXogSGFy
cm9zaCB3cm90ZToNCj4gPiA+Pj4gT24gMDkvMjEvMjAxMSAwMjoyNyBQTSwgQmVubnkgSGFsZXZ5
IHdyb3RlOg0KPiA+ID4+Pj4+IFVubGVzcyB3ZSBkbyBmb2xsb3dpbmc6DQo+ID4gPj4+Pj4gMS4g
cHJlYWxsb2NhdGUgbWVtb3J5IGZvciBleHRlbnQgc3RhdGUgY29udmVydGlvbg0KPiA+ID4+Pj4+
IDIuIHVzZSBuZnNpb2QvcnBjaW9kIHRvIGhhbmRsZSBibF93cml0ZV9jbGVhbnVwDQo+ID4gPj4+
Pj4gMy4gZm9yIHBuZnMgZXJyb3IgY2FzZSwgY3JlYXRlIGEga3RocmVhZCB0byByZWNvbGxhcHNl
IGFuZCByZXNlbmQgdG8gTURTDQo+ID4gPj4+Pj4gSSBkb24ndCBxdWl0ZSB1bmRlcnN0YW5kLiBI
b3cgZG8geW91IHVzZSBuZnMgc3RhdGUgbWFuYWdlciB0byBkbyBvdGhlcg0KPiB0YXNrcz8NCj4g
PiA+Pj4+DQo+ID4gPj4+PiBZb3UgbmVlZCB0byBrZWVwIGEgbGlzdCBvZiB0aGluZ3MgdG8gZG8g
aGFuZ2luZyBvZmYgb2YgdGhlIG5mcyBjbGllbnQgc3RydWN0dXJlDQo+ID4gPj4+PiBhbmQgc2V0
IGEgYml0IGluIGNsX3N0YXRlIHRlbGxpbmcgdGhlIHN0YXRlIG1hbmFnZXIgaXQgaGFzIHdvcmsg
dG8gZG8NCj4gPiA+Pj4+IGFuZCB3YWtlIGl0IHVwLiAgSXQgdGhlbiBuZWVkcyB0byBnbyBvdmVy
IHRoZSBsaXN0IG9mLCBzYXkgbmZzX2lub2Rlcw0KPiA+ID4+Pj4gYW5kIGNhbGwgaW50byB0aGUg
bGF5b3V0IGRyaXZlciB0byBoYW5kbGUgdGhlIGVycm9ycy4NCj4gPiA+Pj4+DQo+ID4gPj4+PiBC
ZW5ueQ0KPiA+ID4+Pg0KPiA+ID4+PiBHb29kIGdvZCwgSXMgaXQgbm90IGFscmVhZHkgdG9vIGNv
bXBsaWNhdGVkPw0KPiA+ID4+Pg0KPiA+ID4+PiBUaGUgTEQgaXMgb3V0IG9mIHRoZSBwaWN0dXJl
LiBZb3UgYWxsIHNlZW1lZCB0byBhZ3JlZSB0aGF0DQo+ID4gPj4+IHRoZSBMRCBoYXMgcmVwb3J0
ZWQgYW4gaW9fZG9uZSBvbiB0aGUgbmZzaW9kL3JwY2lvZCwgYW5kIGluIHRoZSBlcnJvciBjYXNl
DQo+ID4gPj4+IEdlbmVyaWMgbGF5ZXIgbmVlZHMgdG8gZG8gaXQncyBjb2FsZXNjaW5nIG9uIHNv
bWUgb3RoZXIgdGhyZWFkLiBTbw0KPiA+ID4+PiB5b3VyIGRlc2NyaXB0aW9uIGFib3ZlIGlzIG5v
dCBjb3JyZWN0LCB0aGUgTEQgaXMgb3V0IG9mIHRoZSBwaWN0dXJlLg0KPiA+ID4+Pg0KPiA+ID4+
DQo+ID4gPj4gVHJ1ZSwgaWYgdGhlIGxkIGNsZWFudXAgb24gaW9fZG9uZSBpcyBzdWZmaWNpZW50
Lg0KPiA+ID4+DQo+ID4gPj4+IEl0IGFsbCBsb29rcyB0b28gY29tcGxpY2F0ZWQgZm9yIG1lLiBB
IHBuZnMgd29ya3F1ZXVlIGZvciBib3RoIDIgYW5kIDMNCj4gPiA+Pj4gYWJvdmUgaXMgdmVyeSBn
b29kLiBTcGVjaWFsbHkgc2luY2UgdGhlIHdvcmtxdWV1ZSBhbHNvIHNoYXJlcyBnbG9iYWwNCj4g
PiA+Pj4gcG9vbCB0aHJlYWRzLCBObz8gSSBsaWtlIGl0IHRoYXQgdGhlcmUgaXMgYSBwcmVhbGxv
Y2F0ZWQgdGhyZWFkIGZvcg0KPiA+ID4+PiB0aGUgZXJyb3ItY2FzZSwgdGhpbmsgYWJvdXQgaXQu
DQo+ID4gPj4NCj4gPiA+PiBJJ20gZmluZSB0b28gd2l0aCB1c2luZyBhIHdvcmtxdWV1ZSBmb3Ig
dGhlIGVycm9yIGNhc2UuDQo+ID4gPj4gQnV0IEknZCByYXRoZXIgaGF2ZSB0aGUgY29tbW9uIGNh
c2UgZG9uZSBwYXRoIGRvIG9ubHkgbGlnaHR3ZWlnaHQsDQo+ID4gPj4gd2FpdCBmcmVlIHByb2Nl
c3NpbmcuDQo+ID4gPj4NCj4gPiA+PiBCZW5ueQ0KPiA+ID4+DQo+ID4gPg0KPiA+ID4gSWYgYnkg
ImNvbW1vbiBjYXNlIGRvbmUgcGF0aCBkbyBvbmx5IGxpZ2h0d2VpZ2h0IiB5b3UgbWVhbg0KPiA+
ID4gInByZWFsbG9jYXRlIG1lbW9yeSBmb3IgZXh0ZW50IHN0YXRlIGNvbnZlcnNpb24iLiBUaGVu
IEkgYWJzb2x1dGVseQ0KPiA+ID4gYWdyZWUuIEJ1dCBhcyBmYXIgYXMgd29ya3F1ZXVlL2t0aHJl
YWQgdGhlbiBuZnNpb2QvcnBjaW9kLXdxIG9yDQo+ID4gPiBwbmZzLXdxIGlzIGV4YWN0bHkgdGhl
IHNhbWUgZm9yIHRoZSAiY29tbW9uIGNhc2UiLiBVbmxlc3MgSSdtDQo+ID4gPiB0b3RhbGx5IG1p
c3NpbmcgdGhlIHBvaW50LiBXaGF0IGFyZSB5b3Ugc2F5aW5nPw0KPiA+ID4NCj4gPiA+IFRoZXNl
IGFyZSB0aGUgb3B0aW9ucyBzbyBmYXI6DQo+ID4gPg0KPiA+ID4gW1RvZSdzIG9wdGlvbiB3aGlj
aCBoZSByYXRoZXIgbm90XQ0KPiA+ID4gMS4gcHJlYWxsb2NhdGUgbWVtb3J5IGZvciBleHRlbnQg
c3RhdGUgY29udmVyc2lvbg0KPiA+ID4gMi4gdXNlIG5mc2lvZC9ycGNpb2QgdG8gaGFuZGxlIGJs
X3dyaXRlX2NsZWFudXANCj4gPiA+IDMuIGZvciBwbmZzIGVycm9yIGNhc2UsIGNyZWF0ZSBhIGt0
aHJlYWQgdG8gcmVjb2xsYXBzZSBhbmQgcmVzZW5kIHRvIE1EUw0KPiA+ID4NCj4gPiA+IFtNeSBv
cHRpb24gd2hpY2ggSSB0aGluayBUb2UgYWdyZWVzIHdpdGhdDQo+ID4gPiAxLiBwcmVhbGxvY2F0
ZSBtZW1vcnkgZm9yIGV4dGVudCBzdGF0ZSBjb252ZXJzaW9uDQo+ID4gPiAyLiB1c2UgcG5mcy13
cSB0byBoYW5kbGUgYmxfd3JpdGVfY2xlYW51cA0KPiA+ID4gMy4gcG5mcyBlcnJvciBjYXNlLCBq
dXN0IGxpa2UgVG9lJ3MgcGF0Y2hlcyBhcyBwYXJ0IG9mIGlvX2RvbmUNCj4gPiA+ICAgb24gcG5m
cy13cQ0KPiA+IFllYWgsIEkgd291bGQgdm90ZSBmb3IgdGhpcyBvbmUgYmVjYXVzZSBvZiBpdHMg
c2ltcGxpY2l0eS4gOy0pDQo+IA0KPiBTaWdoLi4uIFRoZSBwcm9ibGVtIGlzIHRoYXQgaXQgY29t
cGxldGVseSBmYWlscyB0byBhZGRyZXNzIHRoZSBwcm9ibGVtLg0KPiANCj4gV2hhdCdzIHRoZSBk
aWZmZXJlbmNlIGJldHdlZW4gaGF2aW5nIHBORlMgY29tcGxldGlvbnMgcnVuIG9uIG5mc2lvZCBv
cg0KPiB0aGVpciBvd24gd29yayBxdWV1ZT8gWW91J2QgYmUgcnVubmluZyBpL28gYW5kIGFsbG9j
YXRpb25zIG9uIHRoZSBzYW1lDQo+IHF1ZXVlIGluIGJvdGggY2FzZXMuDQpPSywgSSBnb3QgeW91
ciBwb2ludC4gSSB3YXMgdW5kZXIgdGhlIGltcHJlc3Npb24gdGhhdCB5b3UgZG9u4oCZdCB3YW50
IHBuZnMgaW8gZW5kIHRvIHVzZSBuZnNpb2QgYmVjYXVzZSBuZnNpb2QgaXMgYWxyZWFkeSBhIGhp
Z2hseSBjb250ZW5kZWQgd29ya3F1ZXVlLi4uIEFuZCBub3cgSSBzZWUgdGhhdCBpdCBpcyBiZWNh
dXNlIHlvdSBkb24ndCB3YW50IHRvIGJsb2NrIChiZWNhdXNlIG9mIG1lbW9yeSBhbGxvY2F0aW9u
KSBvbiB0aGUgd29ya3F1ZXVlLg0KQnV0IHNvcnJ5IGZvciBteSBpZ25vcmFuY2UuIEkgdGhvdWdo
dCB3b3JrcXVldWUgd2FzIGludmVudGVkIGp1c3QgZm9yIGhhbmRsaW5nIHRoZSBjYXNlIHRoYXQg
ZnVuY3Rpb25zIGNhbm5vdCBibG9jayBpbiBpbnRlcnJ1cHQgY29udGV4dCBiZWNhdXNlIHdvcmtx
dWV1ZSBydW5zIGluIHByb2Nlc3MgY29udGV4dC4gQ291bGQgeW91IHBsZWFzZSBzaGVkIHNvbWUg
bGlnaHQgb24gd2h5IG5mcydzIHdvcmtxdWV1ZSAobmZzaW9kL3BuZnNpb2QpIGNhbm5vdCBibG9j
az8NCg0KTG9va2luZyBpbnRvIHNvbWUgdXNlcnMgb2YgbmZzaW9kLCBpdCBzZWVtcyBpdCBhbHJl
YWR5IG1heSBibG9jayBkdWUgdG8gbWVtb3J5IGFsbG9jYXRpb24gaW4gc29tZSBjYXNlcyBsaWtl
IGluIGNvZGUgcGF0aDogbmZzNF9vcGVuX2NvbmZpcm1fcmVsZWFzZS0+IG5mczRfb3BlbmRhdGFf
dG9fbmZzNF9zdGF0ZS0+IG5mc19maGdldC0+IGlnZXQ1X2xvY2tlZC0+IGFsbG9jX2lub2RlLT4g
a21lbV9jYWNoZV9hbGxvYw0KDQpBbHNvIGxvb2tpbmcgYXQgb3RoZXIgZmlsZXN5c3RlbXMgKGUu
Zy4sIGV4dDQncyBleHQ0X2VuZF9pb193b3JrIGFuZCByZWlzZXJmcydzIGZsdXNoX2FzeW5jX2Nv
bW1pdHMpLCB0aGV5IGFyZSBhbHNvIHdhcnBwaW5nIHVwIGkvbyBpbiB3b3JrcXVldWUgYW5kIG1h
eSBzbGVlcCB3aXRoIG1lbW9yeSBhbGxvY2F0aW9uIG9yIHdhaXRfb25fYnVmZmVyKCkgdGhlcmUu
Li4NCg0KQmVzdCBSZWdhcmRzLA0KVGFvDQoNCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+9
77+977+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73v
v70i77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7v
v71H77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/
vXrvv73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70=

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
  2011-09-21 16:03                               ` Trond Myklebust
  2011-09-22  3:30                                 ` tao.peng
@ 2011-09-22  7:17                                 ` Boaz Harrosh
  1 sibling, 0 replies; 29+ messages in thread
From: Boaz Harrosh @ 2011-09-22  7:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Peng Tao, Benny Halevy, tao.peng, linux-nfs, honey, rees

On 09/21/2011 07:03 PM, Trond Myklebust wrote:
>>>
>>> These are the options so far:
>>>
>>> [Toe's option which he rather not]
>>> 1. preallocate memory for extent state conversion
>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>
>>> [My option which I think Toe agrees with]
>>> 1. preallocate memory for extent state conversion
>>> 2. use pnfs-wq to handle bl_write_cleanup
>>> 3. pnfs error case, just like Toe's patches as part of io_done
>>>   on pnfs-wq
>> Yeah, I would vote for this one because of its simplicity. ;-)
> 
> Sigh... The problem is that it completely fails to address the problem.
> 
> What's the difference between having pNFS completions run on nfsiod or
> their own work queue? You'd be running i/o and allocations on the same
> queue in both cases.
> 

I don't understand, did you mean "io_done" and "coalescing" (which
does allocations), so IO cannot complete to clean up memory so
allocations can proceed?

But I thought we don't do that with Toe's latest patches. because
we no longer do coalescing on the io_done path. We only re-dirty the
memory and let the normal nfsiod/rpciod do the coalescing.

So it is what you want pnfs-wq is only for io_done the regular
coalescing/allocation is done in fsiod/rpciod. I though that
was the all Idea no? (It was your idea actually)

Or I might completely missed the point. Please explain

> Cheers
> Trond
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pNFS: introduce pnfs private workqueue
  2011-09-11 14:51   ` Benny Halevy
@ 2011-09-11 15:15     ` Benny Halevy
  0 siblings, 0 replies; 29+ messages in thread
From: Benny Halevy @ 2011-09-11 15:15 UTC (permalink / raw)
  To: Jim Rees, Peng Tao; +Cc: linux-nfs, peter honeyman

On 2011-09-11 07:51, Benny Halevy wrote:
> On 2011-09-10 10:41, Jim Rees wrote:
>> From: Peng Tao <bergwolf@gmail.com>
> 
> Hi, I have a few comments inline below.
> Otherwise, the direction and the patch looks good.
> 
>>
>> For layoutdriver io done functions, default workqueue is not a good place as
>> the code is executed in IO path. So add a pnfs private workqueue to handle
>> them.
>>
>> Also change block and object layout code to make use of this private
>> workqueue.
>>
>> Signed-off-by: Peng Tao <peng_tao@emc.com>
>> Signed-off-by: Jim Rees <rees@umich.edu>
>> ---
>>  fs/nfs/blocklayout/blocklayout.c |   17 ++++++++---
>>  fs/nfs/objlayout/objio_osd.c     |    8 ++++++
>>  fs/nfs/objlayout/objlayout.c     |    4 +-
>>  fs/nfs/pnfs.c                    |   52 +++++++++++++++++++++++++++++++++++++-
>>  fs/nfs/pnfs.h                    |    4 +++
>>  5 files changed, 77 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index dc23833..51f70f0 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -228,7 +228,7 @@ bl_end_par_io_read(void *data)
>>  	struct nfs_read_data *rdata = data;
>>  
>>  	INIT_WORK(&rdata->task.u.tk_work, bl_read_cleanup);
>> -	schedule_work(&rdata->task.u.tk_work);
>> +	pnfsiod_queue_work(&rdata->task.u.tk_work);
>>  }
>>  
>>  /* We don't want normal .rpc_call_done callback used, so we replace it
>> @@ -418,7 +418,7 @@ static void bl_end_par_io_write(void *data)
>>  	wdata->task.tk_status = 0;
>>  	wdata->verf.committed = NFS_FILE_SYNC;
>>  	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
>> -	schedule_work(&wdata->task.u.tk_work);
>> +	pnfsiod_queue_work(&wdata->task.u.tk_work);
>>  }
>>  
>>  /* FIXME STUB - mark intersection of layout and page as bad, so is not
>> @@ -981,29 +981,35 @@ static int __init nfs4blocklayout_init(void)
>>  	if (ret)
>>  		goto out;
>>  
>> +	ret = pnfsiod_start();
>> +	if (ret)
>> +		goto out_remove;
>> +
>>  	init_waitqueue_head(&bl_wq);
>>  
>>  	mnt = rpc_get_mount();
>>  	if (IS_ERR(mnt)) {
>>  		ret = PTR_ERR(mnt);
>> -		goto out_remove;
>> +		goto out_stop;
>>  	}
>>  
>>  	ret = vfs_path_lookup(mnt->mnt_root,
>>  			      mnt,
>>  			      NFS_PIPE_DIRNAME, 0, &path);
>>  	if (ret)
>> -		goto out_remove;
>> +		goto out_stop;
>>  
>>  	bl_device_pipe = rpc_mkpipe(path.dentry, "blocklayout", NULL,
>>  				    &bl_upcall_ops, 0);
>>  	if (IS_ERR(bl_device_pipe)) {
>>  		ret = PTR_ERR(bl_device_pipe);
>> -		goto out_remove;
>> +		goto out_stop;
>>  	}
>>  out:
>>  	return ret;
>>  
>> +out_stop:
>> +	pnfsiod_stop();
>>  out_remove:
>>  	pnfs_unregister_layoutdriver(&blocklayout_type);
>>  	return ret;
>> @@ -1015,6 +1021,7 @@ static void __exit nfs4blocklayout_exit(void)
>>  	       __func__);
>>  
>>  	pnfs_unregister_layoutdriver(&blocklayout_type);
>> +	pnfsiod_stop();
>>  	rpc_unlink(bl_device_pipe);
>>  }
>>  
>> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
>> index d0cda12..f28013f 100644
>> --- a/fs/nfs/objlayout/objio_osd.c
>> +++ b/fs/nfs/objlayout/objio_osd.c
>> @@ -1042,7 +1042,14 @@ static int __init
>>  objlayout_init(void)
>>  {
>>  	int ret = pnfs_register_layoutdriver(&objlayout_type);
>> +	if (ret)
>> +		goto out;
>>  
>> +	ret = pnfsiod_start();
>> +	if (ret)
>> +		pnfs_unregister_layoutdriver(&objlayout_type);
>> +
>> +out:

Also, it makes more sense to init the workqueue before registering the
layout driver since it's a prerequisite.

>>  	if (ret)
>>  		printk(KERN_INFO
>>  			"%s: Registering OSD pNFS Layout Driver failed: error=%d\n",
>> @@ -1057,6 +1064,7 @@ static void __exit
>>  objlayout_exit(void)
>>  {
>>  	pnfs_unregister_layoutdriver(&objlayout_type);
>> +	pnfsiod_stop();
>>  	printk(KERN_INFO "%s: Unregistered OSD pNFS Layout Driver\n",
>>  	       __func__);
>>  }
>> diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
>> index 1d06f8e..f7c6c21 100644
>> --- a/fs/nfs/objlayout/objlayout.c
>> +++ b/fs/nfs/objlayout/objlayout.c
>> @@ -305,7 +305,7 @@ objlayout_read_done(struct objlayout_io_state *state, ssize_t status, bool sync)
>>  		pnfs_ld_read_done(rdata);
>>  	else {
>>  		INIT_WORK(&rdata->task.u.tk_work, _rpc_read_complete);
>> -		schedule_work(&rdata->task.u.tk_work);
>> +		pnfsiod_queue_work(&rdata->task.u.tk_work);
>>  	}
>>  }
>>  
>> @@ -396,7 +396,7 @@ objlayout_write_done(struct objlayout_io_state *state, ssize_t status,
>>  		pnfs_ld_write_done(wdata);
>>  	else {
>>  		INIT_WORK(&wdata->task.u.tk_work, _rpc_write_complete);
>> -		schedule_work(&wdata->task.u.tk_work);
>> +		pnfsiod_queue_work(&wdata->task.u.tk_work);
>>  	}
>>  }
>>  
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index e550e88..5ac7a78 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -38,7 +38,7 @@
>>  /* Locking:
>>   *
>>   * pnfs_spinlock:
>> - *      protects pnfs_modules_tbl.
>> + *      protects pnfs_modules_tbl, pnfsiod_workqueue and pnfsiod_users.
>>   */
>>  static DEFINE_SPINLOCK(pnfs_spinlock);
>>  
>> @@ -47,6 +47,9 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
>>   */
>>  static LIST_HEAD(pnfs_modules_tbl);
>>  
>> +static struct workqueue_struct *pnfsiod_workqueue;
>> +static int pnfsiod_users = 0;
> 
> There's no need to initialize static variables to zero.
> 
>> +
>>  /* Return the registered pnfs layout driver module matching given id */
>>  static struct pnfs_layoutdriver_type *
>>  find_pnfs_driver_locked(u32 id)
>> @@ -1478,3 +1481,50 @@ out:
>>  	dprintk("<-- %s status %d\n", __func__, status);
>>  	return status;
>>  }
>> +
>> +/*
>> + * start up the pnfsiod workqueue
>> + */
>> +int pnfsiod_start(void)
>> +{
>> +	struct workqueue_struct *wq;
>> +	dprintk("RPC:       creating workqueue pnfsiod\n");
> 
> hmm, s/RPC/NFS/ this is not the RPC module any more :)
> looks like a cut'n'paste from rpciod_start...
> 
>> +	wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
>> +	if (wq == NULL)
>> +		return -ENOMEM;
>> +	spin_lock(&pnfs_spinlock);
>> +	pnfsiod_users++;
>> +	if (pnfsiod_workqueue == NULL) {
>> +		pnfsiod_workqueue = wq;
>> +	} else {
>> +		destroy_workqueue(wq);
>> +	}
> 
> The curly braces in this statement are an overkill
> (and deviation from CodingStyle)
> 
>> +	spin_unlock(&pnfs_spinlock);
>> +	return 0;
> 
> although this way of accounting is ultimately simple
> it's wasteful and since we're not really expecting any concurrent
> calls to this function. I'd consider coding this it as follows:
> 
> 	atomic_inc(&pnfsiod_users);
> 	if (pnfsiod_workqueue == NULL) {
> 		wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
> 		if (wq == NULL)

sorry, we need to call nfsiod_stop here.
I'll send a SQUASHME patch with my proposed changes
to this one to make it clearer...

Benny

> 			return -ENOMEM;
> 		spin_lock(&pnfs_spinlock);
> 		if (pnfsiod_workqueue == NULL)
> 			pnfsiod_workqueue = wq;
> 		else
> 			destroy_workqueue(wq);
> 		spin_unlock(&pnfs_spinlock);
> 	}
> 
> 	return 0;
> 
>> +}
>> +EXPORT_SYMBOL_GPL(pnfsiod_start);
>> +
>> +/*
>> + * Destroy the pnfsiod workqueue
>> + */
>> +void pnfsiod_stop(void)
>> +{
>> +	struct workqueue_struct *wq = NULL;
>> +
>> +	spin_lock(&pnfs_spinlock);
>> +	pnfsiod_users--;
>> +	if (pnfsiod_users == 0) {
>> +		wq = pnfsiod_workqueue;
>> +		pnfsiod_workqueue = NULL;
>> +	}
>> +	spin_unlock(&pnfs_spinlock);
>> +	if (wq)
>> +		destroy_workqueue(wq);
> 
> and continuing my proposal from above:
> 
> 	if (atomic_dec_and_lock(&pnfsiod_users)) {
> 		wq = pnfsiod_workqueue;
> 		pnfsiod_workqueue = NULL;
> 		spin_unlock(&pnfs_spinlock);
> 	}
> 	if (wq)
> 		destroy_workqueue(wq);
> 
> Benny
> 
>> +}
>> +EXPORT_SYMBOL_GPL(pnfsiod_stop);
>> +
>> +void pnfsiod_queue_work(struct work_struct* work)
>> +{
>> +	queue_work(pnfsiod_workqueue, work);
>> +}
>> +EXPORT_SYMBOL_GPL(pnfsiod_queue_work);
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 01cbfd5..bc1eed5 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -165,6 +165,10 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
>>  extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
>>  
>>  /* pnfs.c */
>> +int pnfsiod_start(void);
>> +void pnfsiod_stop(void);
>> +void pnfsiod_queue_work(struct work_struct* work);
>> +
>>  void get_layout_hdr(struct pnfs_layout_hdr *lo);
>>  void put_lseg(struct pnfs_layout_segment *lseg);
>>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] pNFS: introduce pnfs private workqueue
  2011-09-10 17:41 ` [PATCH 2/3] pNFS: introduce pnfs private workqueue Jim Rees
@ 2011-09-11 14:51   ` Benny Halevy
  2011-09-11 15:15     ` Benny Halevy
  0 siblings, 1 reply; 29+ messages in thread
From: Benny Halevy @ 2011-09-11 14:51 UTC (permalink / raw)
  To: Jim Rees, Peng Tao; +Cc: linux-nfs, peter honeyman

On 2011-09-10 10:41, Jim Rees wrote:
> From: Peng Tao <bergwolf@gmail.com>

Hi, I have a few comments inline below.
Otherwise, the direction and the patch looks good.

> 
> For layoutdriver io done functions, default workqueue is not a good place as
> the code is executed in IO path. So add a pnfs private workqueue to handle
> them.
> 
> Also change block and object layout code to make use of this private
> workqueue.
> 
> Signed-off-by: Peng Tao <peng_tao@emc.com>
> Signed-off-by: Jim Rees <rees@umich.edu>
> ---
>  fs/nfs/blocklayout/blocklayout.c |   17 ++++++++---
>  fs/nfs/objlayout/objio_osd.c     |    8 ++++++
>  fs/nfs/objlayout/objlayout.c     |    4 +-
>  fs/nfs/pnfs.c                    |   52 +++++++++++++++++++++++++++++++++++++-
>  fs/nfs/pnfs.h                    |    4 +++
>  5 files changed, 77 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index dc23833..51f70f0 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -228,7 +228,7 @@ bl_end_par_io_read(void *data)
>  	struct nfs_read_data *rdata = data;
>  
>  	INIT_WORK(&rdata->task.u.tk_work, bl_read_cleanup);
> -	schedule_work(&rdata->task.u.tk_work);
> +	pnfsiod_queue_work(&rdata->task.u.tk_work);
>  }
>  
>  /* We don't want normal .rpc_call_done callback used, so we replace it
> @@ -418,7 +418,7 @@ static void bl_end_par_io_write(void *data)
>  	wdata->task.tk_status = 0;
>  	wdata->verf.committed = NFS_FILE_SYNC;
>  	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
> -	schedule_work(&wdata->task.u.tk_work);
> +	pnfsiod_queue_work(&wdata->task.u.tk_work);
>  }
>  
>  /* FIXME STUB - mark intersection of layout and page as bad, so is not
> @@ -981,29 +981,35 @@ static int __init nfs4blocklayout_init(void)
>  	if (ret)
>  		goto out;
>  
> +	ret = pnfsiod_start();
> +	if (ret)
> +		goto out_remove;
> +
>  	init_waitqueue_head(&bl_wq);
>  
>  	mnt = rpc_get_mount();
>  	if (IS_ERR(mnt)) {
>  		ret = PTR_ERR(mnt);
> -		goto out_remove;
> +		goto out_stop;
>  	}
>  
>  	ret = vfs_path_lookup(mnt->mnt_root,
>  			      mnt,
>  			      NFS_PIPE_DIRNAME, 0, &path);
>  	if (ret)
> -		goto out_remove;
> +		goto out_stop;
>  
>  	bl_device_pipe = rpc_mkpipe(path.dentry, "blocklayout", NULL,
>  				    &bl_upcall_ops, 0);
>  	if (IS_ERR(bl_device_pipe)) {
>  		ret = PTR_ERR(bl_device_pipe);
> -		goto out_remove;
> +		goto out_stop;
>  	}
>  out:
>  	return ret;
>  
> +out_stop:
> +	pnfsiod_stop();
>  out_remove:
>  	pnfs_unregister_layoutdriver(&blocklayout_type);
>  	return ret;
> @@ -1015,6 +1021,7 @@ static void __exit nfs4blocklayout_exit(void)
>  	       __func__);
>  
>  	pnfs_unregister_layoutdriver(&blocklayout_type);
> +	pnfsiod_stop();
>  	rpc_unlink(bl_device_pipe);
>  }
>  
> diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
> index d0cda12..f28013f 100644
> --- a/fs/nfs/objlayout/objio_osd.c
> +++ b/fs/nfs/objlayout/objio_osd.c
> @@ -1042,7 +1042,14 @@ static int __init
>  objlayout_init(void)
>  {
>  	int ret = pnfs_register_layoutdriver(&objlayout_type);
> +	if (ret)
> +		goto out;
>  
> +	ret = pnfsiod_start();
> +	if (ret)
> +		pnfs_unregister_layoutdriver(&objlayout_type);
> +
> +out:
>  	if (ret)
>  		printk(KERN_INFO
>  			"%s: Registering OSD pNFS Layout Driver failed: error=%d\n",
> @@ -1057,6 +1064,7 @@ static void __exit
>  objlayout_exit(void)
>  {
>  	pnfs_unregister_layoutdriver(&objlayout_type);
> +	pnfsiod_stop();
>  	printk(KERN_INFO "%s: Unregistered OSD pNFS Layout Driver\n",
>  	       __func__);
>  }
> diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
> index 1d06f8e..f7c6c21 100644
> --- a/fs/nfs/objlayout/objlayout.c
> +++ b/fs/nfs/objlayout/objlayout.c
> @@ -305,7 +305,7 @@ objlayout_read_done(struct objlayout_io_state *state, ssize_t status, bool sync)
>  		pnfs_ld_read_done(rdata);
>  	else {
>  		INIT_WORK(&rdata->task.u.tk_work, _rpc_read_complete);
> -		schedule_work(&rdata->task.u.tk_work);
> +		pnfsiod_queue_work(&rdata->task.u.tk_work);
>  	}
>  }
>  
> @@ -396,7 +396,7 @@ objlayout_write_done(struct objlayout_io_state *state, ssize_t status,
>  		pnfs_ld_write_done(wdata);
>  	else {
>  		INIT_WORK(&wdata->task.u.tk_work, _rpc_write_complete);
> -		schedule_work(&wdata->task.u.tk_work);
> +		pnfsiod_queue_work(&wdata->task.u.tk_work);
>  	}
>  }
>  
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index e550e88..5ac7a78 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -38,7 +38,7 @@
>  /* Locking:
>   *
>   * pnfs_spinlock:
> - *      protects pnfs_modules_tbl.
> + *      protects pnfs_modules_tbl, pnfsiod_workqueue and pnfsiod_users.
>   */
>  static DEFINE_SPINLOCK(pnfs_spinlock);
>  
> @@ -47,6 +47,9 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
>   */
>  static LIST_HEAD(pnfs_modules_tbl);
>  
> +static struct workqueue_struct *pnfsiod_workqueue;
> +static int pnfsiod_users = 0;

There's no need to initialize static variables to zero.

> +
>  /* Return the registered pnfs layout driver module matching given id */
>  static struct pnfs_layoutdriver_type *
>  find_pnfs_driver_locked(u32 id)
> @@ -1478,3 +1481,50 @@ out:
>  	dprintk("<-- %s status %d\n", __func__, status);
>  	return status;
>  }
> +
> +/*
> + * start up the pnfsiod workqueue
> + */
> +int pnfsiod_start(void)
> +{
> +	struct workqueue_struct *wq;
> +	dprintk("RPC:       creating workqueue pnfsiod\n");

hmm, s/RPC/NFS/ this is not the RPC module any more :)
looks like a cut'n'paste from rpciod_start...

> +	wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
> +	if (wq == NULL)
> +		return -ENOMEM;
> +	spin_lock(&pnfs_spinlock);
> +	pnfsiod_users++;
> +	if (pnfsiod_workqueue == NULL) {
> +		pnfsiod_workqueue = wq;
> +	} else {
> +		destroy_workqueue(wq);
> +	}

The curly braces in this statement are an overkill
(and deviation from CodingStyle)

> +	spin_unlock(&pnfs_spinlock);
> +	return 0;

although this way of accounting is ultimately simple
it's wasteful and since we're not really expecting any concurrent
calls to this function. I'd consider coding this it as follows:

	atomic_inc(&pnfsiod_users);
	if (pnfsiod_workqueue == NULL) {
		wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
		if (wq == NULL)
			return -ENOMEM;
		spin_lock(&pnfs_spinlock);
		if (pnfsiod_workqueue == NULL)
			pnfsiod_workqueue = wq;
		else
			destroy_workqueue(wq);
		spin_unlock(&pnfs_spinlock);
	}

	return 0;

> +}
> +EXPORT_SYMBOL_GPL(pnfsiod_start);
> +
> +/*
> + * Destroy the pnfsiod workqueue
> + */
> +void pnfsiod_stop(void)
> +{
> +	struct workqueue_struct *wq = NULL;
> +
> +	spin_lock(&pnfs_spinlock);
> +	pnfsiod_users--;
> +	if (pnfsiod_users == 0) {
> +		wq = pnfsiod_workqueue;
> +		pnfsiod_workqueue = NULL;
> +	}
> +	spin_unlock(&pnfs_spinlock);
> +	if (wq)
> +		destroy_workqueue(wq);

and continuing my proposal from above:

	if (atomic_dec_and_lock(&pnfsiod_users)) {
		wq = pnfsiod_workqueue;
		pnfsiod_workqueue = NULL;
		spin_unlock(&pnfs_spinlock);
	}
	if (wq)
		destroy_workqueue(wq);

Benny

> +}
> +EXPORT_SYMBOL_GPL(pnfsiod_stop);
> +
> +void pnfsiod_queue_work(struct work_struct* work)
> +{
> +	queue_work(pnfsiod_workqueue, work);
> +}
> +EXPORT_SYMBOL_GPL(pnfsiod_queue_work);
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 01cbfd5..bc1eed5 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -165,6 +165,10 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
>  extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
>  
>  /* pnfs.c */
> +int pnfsiod_start(void);
> +void pnfsiod_stop(void);
> +void pnfsiod_queue_work(struct work_struct* work);
> +
>  void get_layout_hdr(struct pnfs_layout_hdr *lo);
>  void put_lseg(struct pnfs_layout_segment *lseg);
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/3] pNFS: introduce pnfs private workqueue
  2011-09-10 17:41 [PATCH 0/3] pnfs private workqueue, and two cleanups Jim Rees
@ 2011-09-10 17:41 ` Jim Rees
  2011-09-11 14:51   ` Benny Halevy
  0 siblings, 1 reply; 29+ messages in thread
From: Jim Rees @ 2011-09-10 17:41 UTC (permalink / raw)
  To: Benny Halevy; +Cc: linux-nfs, peter honeyman

From: Peng Tao <bergwolf@gmail.com>

For layoutdriver io done functions, default workqueue is not a good place as
the code is executed in IO path. So add a pnfs private workqueue to handle
them.

Also change block and object layout code to make use of this private
workqueue.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
---
 fs/nfs/blocklayout/blocklayout.c |   17 ++++++++---
 fs/nfs/objlayout/objio_osd.c     |    8 ++++++
 fs/nfs/objlayout/objlayout.c     |    4 +-
 fs/nfs/pnfs.c                    |   52 +++++++++++++++++++++++++++++++++++++-
 fs/nfs/pnfs.h                    |    4 +++
 5 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index dc23833..51f70f0 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -228,7 +228,7 @@ bl_end_par_io_read(void *data)
 	struct nfs_read_data *rdata = data;
 
 	INIT_WORK(&rdata->task.u.tk_work, bl_read_cleanup);
-	schedule_work(&rdata->task.u.tk_work);
+	pnfsiod_queue_work(&rdata->task.u.tk_work);
 }
 
 /* We don't want normal .rpc_call_done callback used, so we replace it
@@ -418,7 +418,7 @@ static void bl_end_par_io_write(void *data)
 	wdata->task.tk_status = 0;
 	wdata->verf.committed = NFS_FILE_SYNC;
 	INIT_WORK(&wdata->task.u.tk_work, bl_write_cleanup);
-	schedule_work(&wdata->task.u.tk_work);
+	pnfsiod_queue_work(&wdata->task.u.tk_work);
 }
 
 /* FIXME STUB - mark intersection of layout and page as bad, so is not
@@ -981,29 +981,35 @@ static int __init nfs4blocklayout_init(void)
 	if (ret)
 		goto out;
 
+	ret = pnfsiod_start();
+	if (ret)
+		goto out_remove;
+
 	init_waitqueue_head(&bl_wq);
 
 	mnt = rpc_get_mount();
 	if (IS_ERR(mnt)) {
 		ret = PTR_ERR(mnt);
-		goto out_remove;
+		goto out_stop;
 	}
 
 	ret = vfs_path_lookup(mnt->mnt_root,
 			      mnt,
 			      NFS_PIPE_DIRNAME, 0, &path);
 	if (ret)
-		goto out_remove;
+		goto out_stop;
 
 	bl_device_pipe = rpc_mkpipe(path.dentry, "blocklayout", NULL,
 				    &bl_upcall_ops, 0);
 	if (IS_ERR(bl_device_pipe)) {
 		ret = PTR_ERR(bl_device_pipe);
-		goto out_remove;
+		goto out_stop;
 	}
 out:
 	return ret;
 
+out_stop:
+	pnfsiod_stop();
 out_remove:
 	pnfs_unregister_layoutdriver(&blocklayout_type);
 	return ret;
@@ -1015,6 +1021,7 @@ static void __exit nfs4blocklayout_exit(void)
 	       __func__);
 
 	pnfs_unregister_layoutdriver(&blocklayout_type);
+	pnfsiod_stop();
 	rpc_unlink(bl_device_pipe);
 }
 
diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index d0cda12..f28013f 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -1042,7 +1042,14 @@ static int __init
 objlayout_init(void)
 {
 	int ret = pnfs_register_layoutdriver(&objlayout_type);
+	if (ret)
+		goto out;
 
+	ret = pnfsiod_start();
+	if (ret)
+		pnfs_unregister_layoutdriver(&objlayout_type);
+
+out:
 	if (ret)
 		printk(KERN_INFO
 			"%s: Registering OSD pNFS Layout Driver failed: error=%d\n",
@@ -1057,6 +1064,7 @@ static void __exit
 objlayout_exit(void)
 {
 	pnfs_unregister_layoutdriver(&objlayout_type);
+	pnfsiod_stop();
 	printk(KERN_INFO "%s: Unregistered OSD pNFS Layout Driver\n",
 	       __func__);
 }
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 1d06f8e..f7c6c21 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -305,7 +305,7 @@ objlayout_read_done(struct objlayout_io_state *state, ssize_t status, bool sync)
 		pnfs_ld_read_done(rdata);
 	else {
 		INIT_WORK(&rdata->task.u.tk_work, _rpc_read_complete);
-		schedule_work(&rdata->task.u.tk_work);
+		pnfsiod_queue_work(&rdata->task.u.tk_work);
 	}
 }
 
@@ -396,7 +396,7 @@ objlayout_write_done(struct objlayout_io_state *state, ssize_t status,
 		pnfs_ld_write_done(wdata);
 	else {
 		INIT_WORK(&wdata->task.u.tk_work, _rpc_write_complete);
-		schedule_work(&wdata->task.u.tk_work);
+		pnfsiod_queue_work(&wdata->task.u.tk_work);
 	}
 }
 
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index e550e88..5ac7a78 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -38,7 +38,7 @@
 /* Locking:
  *
  * pnfs_spinlock:
- *      protects pnfs_modules_tbl.
+ *      protects pnfs_modules_tbl, pnfsiod_workqueue and pnfsiod_users.
  */
 static DEFINE_SPINLOCK(pnfs_spinlock);
 
@@ -47,6 +47,9 @@ static DEFINE_SPINLOCK(pnfs_spinlock);
  */
 static LIST_HEAD(pnfs_modules_tbl);
 
+static struct workqueue_struct *pnfsiod_workqueue;
+static int pnfsiod_users = 0;
+
 /* Return the registered pnfs layout driver module matching given id */
 static struct pnfs_layoutdriver_type *
 find_pnfs_driver_locked(u32 id)
@@ -1478,3 +1481,50 @@ out:
 	dprintk("<-- %s status %d\n", __func__, status);
 	return status;
 }
+
+/*
+ * start up the pnfsiod workqueue
+ */
+int pnfsiod_start(void)
+{
+	struct workqueue_struct *wq;
+	dprintk("RPC:       creating workqueue pnfsiod\n");
+	wq = alloc_workqueue("pnfsiod", WQ_MEM_RECLAIM, 0);
+	if (wq == NULL)
+		return -ENOMEM;
+	spin_lock(&pnfs_spinlock);
+	pnfsiod_users++;
+	if (pnfsiod_workqueue == NULL) {
+		pnfsiod_workqueue = wq;
+	} else {
+		destroy_workqueue(wq);
+	}
+	spin_unlock(&pnfs_spinlock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnfsiod_start);
+
+/*
+ * Destroy the pnfsiod workqueue
+ */
+void pnfsiod_stop(void)
+{
+	struct workqueue_struct *wq = NULL;
+
+	spin_lock(&pnfs_spinlock);
+	pnfsiod_users--;
+	if (pnfsiod_users == 0) {
+		wq = pnfsiod_workqueue;
+		pnfsiod_workqueue = NULL;
+	}
+	spin_unlock(&pnfs_spinlock);
+	if (wq)
+		destroy_workqueue(wq);
+}
+EXPORT_SYMBOL_GPL(pnfsiod_stop);
+
+void pnfsiod_queue_work(struct work_struct* work)
+{
+	queue_work(pnfsiod_workqueue, work);
+}
+EXPORT_SYMBOL_GPL(pnfsiod_queue_work);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 01cbfd5..bc1eed5 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -165,6 +165,10 @@ extern int nfs4_proc_layoutget(struct nfs4_layoutget *lgp);
 extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp);
 
 /* pnfs.c */
+int pnfsiod_start(void);
+void pnfsiod_stop(void);
+void pnfsiod_queue_work(struct work_struct* work);
+
 void get_layout_hdr(struct pnfs_layout_hdr *lo);
 void put_lseg(struct pnfs_layout_segment *lseg);
 
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-09-22  7:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-20  3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
2011-09-20  3:18 ` [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put Jim Rees
2011-09-20  3:18 ` [PATCH 2/3] pnfs: introduce pnfs private workqueue Jim Rees
2011-09-20 22:41   ` Trond Myklebust
2011-09-21  0:29     ` Jim Rees
2011-09-21  2:44       ` tao.peng
2011-09-21  4:20         ` Myklebust, Trond
2011-09-21  5:16           ` tao.peng
2011-09-21  7:04             ` Benny Halevy
2011-09-21 10:23               ` tao.peng
2011-09-21 10:38                 ` Boaz Harrosh
2011-09-21 11:04                   ` tao.peng
2011-09-21 10:56                 ` Benny Halevy
2011-09-21 11:10                   ` tao.peng
2011-09-21 11:27                     ` Benny Halevy
2011-09-21 11:42                       ` Boaz Harrosh
2011-09-21 11:50                         ` Benny Halevy
2011-09-21 13:56                           ` Boaz Harrosh
2011-09-21 15:45                             ` Peng Tao
2011-09-21 16:03                               ` Trond Myklebust
2011-09-22  3:30                                 ` tao.peng
2011-09-22  7:17                                 ` Boaz Harrosh
2011-09-21  4:22       ` Myklebust, Trond
2011-09-20  3:18 ` [PATCH 3/3] SQUASHME: pnfs: simplify and clean up pnfsiod workqueue Jim Rees
2011-09-21 11:52 ` [PATCH 0/3] replacement for "introduce pnfs private workqueue" Benny Halevy
2011-09-21 12:32   ` Jim Rees
  -- strict thread matches above, loose matches on Subject: below --
2011-09-10 17:41 [PATCH 0/3] pnfs private workqueue, and two cleanups Jim Rees
2011-09-10 17:41 ` [PATCH 2/3] pNFS: introduce pnfs private workqueue Jim Rees
2011-09-11 14:51   ` Benny Halevy
2011-09-11 15:15     ` Benny Halevy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.