All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH v2 00/29] More lustre patches
@ 2019-05-20 12:50 James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes James Simmons
                   ` (28 more replies)
  0 siblings, 29 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

Some of these I posted before but didn't get a formal Reviewed-by.
Some needed some revision.
Some did get a reviewed-by, but depend on patches in the above
categories.

Others are brand new.

James Simmons (1):
  lustre: llite: don't use class_setup_tunables()

NeilBrown (27):
  lustre: llite: fix error in vvp_pgcache seqfile.
  lustre: llite: replace lli_trunc_sem
  lustre: lov: use GFP_NOFS to allocate lo_entries.
  lustre: embed typ_kobj in obd_type
  lustre: obd: collect all resource releasing for obj_type.
  lustre: obd_type: use typ_kobj.name as typ_name
  lustre: obd_type: discard obd_types linked list.
  lustre: obd_type: discard obd_type_lock
  lustre: obdclass: don't copy ops structures in to new type.
  lustre: obdclass: fix module load locking.
  lustre: convert rsi_sem to a spinlock.
  lustre: ldlm: discard varname in ldlm_pool.
  lustre: lprocfs: use log2.h macros instead of shift loop.
  lustre: handles: discard h_owner in favour of h_ops
  lustre: handle: move refcount into the lustre_handle.
  lustre: discard OBD_FREE_RCU
  lustre: portals_handle: rename ops to owner
  lustre: portals_handle: remove locking from class_handle2object()
  lustre: portals_handle: use hlist for hash lists.
  lustre: portals_handle: discard h_lock.
  lustre: remove unused fields from struct obd_device
  lustre: obd_sysfs: error-check value stored in jobid_var
  lustre: obdclass: discard process_quota_config
  lustre: obdclass: remove unnecessary code from lustre_init_lsi()
  lustre: ldlm: discard l_lock from struct ldlm_lock.
  lustre: ldlm: don't access l_resource when not locked.
  lustre: ldlm: drop SLAB_TYPESAFE_BY_RCU from ldlm_lock slab.

Patrick Farrell (1):
  lustre: llite: ll_fault fixes

 fs/lustre/include/lustre_dlm.h           |  11 --
 fs/lustre/include/lustre_export.h        |   1 -
 fs/lustre/include/lustre_handles.h       |  22 +--
 fs/lustre/include/lustre_import.h        |   2 -
 fs/lustre/include/lustre_net.h           |   4 +-
 fs/lustre/include/obd.h                  |  45 +++---
 fs/lustre/include/obd_class.h            |  10 +-
 fs/lustre/include/obd_support.h          |  10 --
 fs/lustre/ldlm/l_lock.c                  |  20 ++-
 fs/lustre/ldlm/ldlm_internal.h           |   2 -
 fs/lustre/ldlm/ldlm_lib.c                |   2 +-
 fs/lustre/ldlm/ldlm_lock.c               | 101 ++++++-------
 fs/lustre/ldlm/ldlm_lockd.c              |  24 ++-
 fs/lustre/ldlm/ldlm_pool.c               |  18 +--
 fs/lustre/ldlm/ldlm_resource.c           |   9 +-
 fs/lustre/llite/llite_internal.h         |   3 +-
 fs/lustre/llite/llite_lib.c              |   9 +-
 fs/lustre/llite/llite_mmap.c             |   6 +-
 fs/lustre/llite/lproc_llite.c            |  45 ++++--
 fs/lustre/llite/vvp_dev.c                |   1 +
 fs/lustre/llite/vvp_io.c                 |  28 +++-
 fs/lustre/lmv/lmv_obd.c                  |   4 +-
 fs/lustre/lov/lov_obd.c                  |   2 +-
 fs/lustre/lov/lov_object.c               |   6 +-
 fs/lustre/mdc/mdc_request.c              |   6 +-
 fs/lustre/mgc/mgc_request.c              |  13 +-
 fs/lustre/obdclass/class_obd.c           |  27 +---
 fs/lustre/obdclass/genops.c              | 247 +++++++++++++------------------
 fs/lustre/obdclass/lprocfs_status.c      |  20 +--
 fs/lustre/obdclass/lu_object.c           |   2 +-
 fs/lustre/obdclass/lustre_handles.c      |  62 +++-----
 fs/lustre/obdclass/obd_config.c          |  30 ----
 fs/lustre/obdclass/obd_mount.c           |   6 +-
 fs/lustre/obdclass/obd_sysfs.c           |  21 ++-
 fs/lustre/obdecho/echo_client.c          |   4 +-
 fs/lustre/osc/osc_request.c              |   2 +-
 fs/lustre/ptlrpc/client.c                |   2 +-
 fs/lustre/ptlrpc/service.c               |   4 +-
 include/uapi/linux/lustre/lustre_ioctl.h |   1 -
 39 files changed, 352 insertions(+), 480 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-22  3:54   ` NeilBrown
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 02/29] lustre: llite: fix error in vvp_pgcache seqfile James Simmons
                   ` (27 subsequent siblings)
  28 siblings, 1 reply; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: Patrick Farrell <pfarrell@whamcloud.com>

Various error conditions in the fault path can cause us to
not return a page in vm_fault.  Check if it's present
before accessing it.

Additionally, it's not valid to return VM_FAULT_NOPAGE for
page faults.  The correct return when accessing a page that
does not exist is VM_FAULT_SIGBUS.  Correcting this avoids
looping infinitely in the testcase.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
Reviewed-on: https://review.whamcloud.com/34247
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_mmap.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index 1865db1..c8e57ad 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
 	case 0:
 		result = VM_FAULT_LOCKED;
 		break;
-	case -EFAULT:
-		result = VM_FAULT_NOPAGE;
-		break;
 	case -ENOMEM:
 		result = VM_FAULT_OOM;
 		break;
@@ -366,7 +363,8 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
 
 restart:
 	result = __ll_fault(vmf->vma, vmf);
-	if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
+	if (vmf->page &&
+	    !(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
 		struct page *vmpage = vmf->page;
 
 		/* check if this page has been truncated */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 02/29] lustre: llite: fix error in vvp_pgcache seqfile.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 03/29] lustre: llite: replace lli_trunc_sem James Simmons
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

The ->next function should increment the 'pos', but it doesn't.
This oversight causes the WARN_ON to fire.
Core seq_file code tries to handle this error, but it is
best not to make it.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/llite/vvp_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/lustre/llite/vvp_dev.c b/fs/lustre/llite/vvp_dev.c
index c10ca6e..96262a1 100644
--- a/fs/lustre/llite/vvp_dev.c
+++ b/fs/lustre/llite/vvp_dev.c
@@ -565,6 +565,7 @@ static void *vvp_pgcache_next(struct seq_file *f, void *v, loff_t *pos)
 
 	WARN_ON(*pos != priv->vsp_prev_pos);
 
+	(*pos)++;
 	priv->vsp_prev_pos = *pos;
 	return vvp_pgcache_next_page(priv);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 03/29] lustre: llite: replace lli_trunc_sem
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 02/29] lustre: llite: fix error in vvp_pgcache seqfile James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 04/29] lustre: lov: use GFP_NOFS to allocate lo_entries James Simmons
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

lli_trunc_sem can lead to a readlock.

vvp_io_read_start can take mmap_sem while holding lli_trunc_sem,
and vvp_io_fault_start will take lli_trunc_sem while holding mmap_sem.

These aren't necessarily the same mmap_sem, but can be if you mmap a
lustre file, then read into that mapped memory from the file.

These are both 'down_read' calls on lli_trunc_sem so they don't
necessarily conflict, but if vvp_io_setattr_start() is called to
truncate the file between these, the later will wait for the former
and a deadlock can eventuate.

Solve this by replacing with a hand-coded semaphore, using atomic
counters and wait_var_event().
In the vvp_io_fault_start() case where mmap_sem is held, don't wait
for a pending writer, only for an active writer.  This means we won't
wait if vvp_io_read_start has started, and so no deadlock happens.

I'd like there to be a better way to fix this, but I haven't found it
yet.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/llite/llite_internal.h |  3 ++-
 fs/lustre/llite/llite_lib.c      |  3 ++-
 fs/lustre/llite/vvp_io.c         | 28 +++++++++++++++++++++-------
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 9da59b1..7566b1b 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -190,7 +190,8 @@ struct ll_inode_info {
 			 *    struct list_head wait_list;
 			 * }
 			 */
-			struct rw_semaphore		lli_trunc_sem;
+			atomic_t			lli_trunc_readers;
+			atomic_t			lli_trunc_waiters;
 			struct range_lock_tree		lli_write_tree;
 
 			struct rw_semaphore		lli_glimpse_sem;
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 4e98eb4..ab7c84a 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -894,7 +894,8 @@ void ll_lli_init(struct ll_inode_info *lli)
 	} else {
 		mutex_init(&lli->lli_size_mutex);
 		lli->lli_symlink_name = NULL;
-		init_rwsem(&lli->lli_trunc_sem);
+		atomic_set(&lli->lli_trunc_readers, 0);
+		atomic_set(&lli->lli_trunc_waiters, 0);
 		range_lock_tree_init(&lli->lli_write_tree);
 		init_rwsem(&lli->lli_glimpse_sem);
 		lli->lli_glimpse_time = 0;
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 225a858..a9db530 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -667,7 +667,10 @@ static int vvp_io_setattr_start(const struct lu_env *env,
 	struct ll_inode_info *lli = ll_i2info(inode);
 
 	if (cl_io_is_trunc(io)) {
-		down_write(&lli->lli_trunc_sem);
+		atomic_inc(&lli->lli_trunc_waiters);
+		wait_var_event(&lli->lli_trunc_readers,
+			       atomic_cmpxchg(&lli->lli_trunc_readers, 0, -1) == 0);
+		atomic_dec(&lli->lli_trunc_waiters);
 		inode_lock(inode);
 		inode_dio_wait(inode);
 	} else {
@@ -693,7 +696,8 @@ static void vvp_io_setattr_end(const struct lu_env *env,
 		 */
 		vvp_do_vmtruncate(inode, io->u.ci_setattr.sa_attr.lvb_size);
 		inode_unlock(inode);
-		up_write(&lli->lli_trunc_sem);
+		atomic_set(&lli->lli_trunc_readers, 0);
+		wake_up_var(&lli->lli_trunc_readers);
 	} else {
 		inode_unlock(inode);
 	}
@@ -732,7 +736,9 @@ static int vvp_io_read_start(const struct lu_env *env,
 
 	CDEBUG(D_VFSTRACE, "read: -> [%lli, %lli)\n", pos, pos + cnt);
 
-	down_read(&lli->lli_trunc_sem);
+	wait_var_event(&lli->lli_trunc_readers,
+		       atomic_read(&lli->lli_trunc_waiters) == 0 &&
+		       atomic_inc_unless_negative(&lli->lli_trunc_readers));
 
 	if (!can_populate_pages(env, io, inode))
 		return 0;
@@ -965,7 +971,9 @@ static int vvp_io_write_start(const struct lu_env *env,
 	size_t cnt = io->u.ci_wr.wr.crw_count;
 	ssize_t result = 0;
 
-	down_read(&lli->lli_trunc_sem);
+	wait_var_event(&lli->lli_trunc_readers,
+		       atomic_read(&lli->lli_trunc_waiters) == 0 &&
+		       atomic_inc_unless_negative(&lli->lli_trunc_readers));
 
 	if (!can_populate_pages(env, io, inode))
 		return 0;
@@ -1059,7 +1067,9 @@ static void vvp_io_rw_end(const struct lu_env *env,
 	struct inode *inode = vvp_object_inode(ios->cis_obj);
 	struct ll_inode_info *lli = ll_i2info(inode);
 
-	up_read(&lli->lli_trunc_sem);
+	if (atomic_dec_return(&lli->lli_trunc_readers) == 0 &&
+	    atomic_read(&lli->lli_trunc_waiters))
+		wake_up_var(&lli->lli_trunc_readers);
 }
 
 static int vvp_io_kernel_fault(struct vvp_fault_io *cfio)
@@ -1124,7 +1134,8 @@ static int vvp_io_fault_start(const struct lu_env *env,
 	loff_t size;
 	pgoff_t last_index;
 
-	down_read(&lli->lli_trunc_sem);
+	wait_var_event(&lli->lli_trunc_readers,
+		       atomic_inc_unless_negative(&lli->lli_trunc_readers));
 
 	/* offset of the last byte on the page */
 	offset = cl_offset(obj, fio->ft_index + 1) - 1;
@@ -1281,7 +1292,10 @@ static void vvp_io_fault_end(const struct lu_env *env,
 
 	CLOBINVRNT(env, ios->cis_io->ci_obj,
 		   vvp_object_invariant(ios->cis_io->ci_obj));
-	up_read(&lli->lli_trunc_sem);
+
+	if (atomic_dec_return(&lli->lli_trunc_readers) == 0 &&
+	    atomic_read(&lli->lli_trunc_waiters))
+		wake_up_var(&lli->lli_trunc_readers);
 }
 
 static int vvp_io_fsync_start(const struct lu_env *env,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 04/29] lustre: lov: use GFP_NOFS to allocate lo_entries.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (2 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 03/29] lustre: llite: replace lli_trunc_sem James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables() James Simmons
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

lo_type_guard is taken during memory reclaim:

               -> #0 (&lov->lo_type_guard){++++}:
[  576.552501]        down_read+0x27/0x7c
[  576.553072]        lov_object_delete+0xd5/0x1bb
[  576.553836]        lu_object_free+0x5f/0x111
[  576.554524]        lu_object_put+0x313/0x337
[  576.555267]        cl_inode_fini+0x158/0x1ac
[  576.555898]        ll_delete_inode+0xcc/0xd5
[  576.556572]        evict+0xb4/0x166
[  576.557162]        dispose_list+0x30/0x34
[  576.557834]        prune_icache_sb+0x55/0x73
[  576.558465]        super_cache_scan+0x122/0x16d
[  576.559138]        shrink_slab.part.22.constprop.37+0x27b/0x414
[  576.560059]        shrink_node+0x8d/0x1b3

and lov_init_composite is called while lo_type_guard is held:

[  576.540707]        lov_init_composite+0xff/0xc5f
[  576.541496]        lov_conf_set+0x4cf/0x658
[  576.542179]        cl_conf_set+0x49/0x58
[  576.542790]        cl_file_inode_init+0x21a/0x2b8
[  576.543564]        ll_update_inode+0x4e/0xea5
[  576.544277]        ll_iget+0x112/0x1bf
[  576.544871]        ll_prep_inode+0x242/0x43d
[  576.545541]        ll_lookup_it_finish+0xcf/0x51d
[  576.546308]        ll_lookup_it+0x52e/0x5fc
[  576.546976]        ll_atomic_open+0x1ce/0x74f
[  576.547666]        lookup_open+0x298/0x526
[  576.548325]        path_openat+0x29a/0x7d3
[  576.548917]        do_filp_open+0x57/0xc1
[  576.549539]        do_sys_open+0x137/0x1e7

so it is not safe to use GFP_KERNEL in lov_init_composite().

Use GFP_NOFS instead.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/lov/lov_object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c
index 2058275..261f7b7 100644
--- a/fs/lustre/lov/lov_object.c
+++ b/fs/lustre/lov/lov_object.c
@@ -216,8 +216,8 @@ static int lov_init_raid0(const struct lu_env *env, struct lov_device *dev,
 	r0->lo_nr = lse->lsme_stripe_count;
 	LASSERT(r0->lo_nr <= lov_targets_nr(dev));
 
-	r0->lo_sub = kvzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
-			      GFP_KERNEL);
+	r0->lo_sub = kzalloc(r0->lo_nr * sizeof(r0->lo_sub[0]),
+			      GFP_NOFS);
 	if (!r0->lo_sub)
 		return -ENOMEM;
 
@@ -302,7 +302,7 @@ static int lov_init_composite(const struct lu_env *env, struct lov_device *dev,
 	comp->lo_entry_count = entry_count;
 
 	comp->lo_entries = kcalloc(entry_count, sizeof(*comp->lo_entries),
-				   GFP_KERNEL);
+				   GFP_NOFS);
 	if (!comp->lo_entries)
 		return -ENOMEM;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables()
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (3 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 04/29] lustre: lov: use GFP_NOFS to allocate lo_entries James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-22  4:22   ` NeilBrown
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type James Simmons
                   ` (23 subsequent siblings)
  28 siblings, 1 reply; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

llite is very different from the other types of lustre devices.
Since this is the case llite should register independently. Doing
this allows us to cleanup the debugfs registering in the release
function of struct kobj_type.

Signed-off-by: James Simmons <uja.ornl@yahoo.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-8066
Reviewed-on: https://review.whamcloud.com/34292
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/lproc_llite.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 5de8462..91b0a50 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -42,18 +42,40 @@
 static struct kobject *llite_kobj;
 static struct dentry *llite_root;
 
+static void llite_kobj_release(struct kobject *kobj)
+{
+	if (!IS_ERR_OR_NULL(llite_root)) {
+		debugfs_remove(llite_root);
+		llite_root = NULL;
+	}
+
+	kfree(kobj);
+}
+
+static struct kobj_type llite_kobj_ktype = {
+	.release	= llite_kobj_release,
+	.sysfs_ops	= &lustre_sysfs_ops,
+};
+
 int llite_tunables_register(void)
 {
-	int rc = 0;
+	int rc;
+
+	llite_kobj = kzalloc(sizeof(*llite_kobj), GFP_KERNEL);
+	if (!llite_kobj)
+		return -ENOMEM;
 
-	llite_kobj = class_setup_tunables("llite");
-	if (IS_ERR(llite_kobj))
-		return PTR_ERR(llite_kobj);
+	llite_kobj->kset = lustre_kset;
+	rc = kobject_init_and_add(llite_kobj, &llite_kobj_ktype,
+				  &lustre_kset->kobj, "%s", "llite");
+	if (rc)
+		goto free_kobj;
 
 	llite_root = debugfs_create_dir("llite", debugfs_lustre_root);
 	if (IS_ERR_OR_NULL(llite_root)) {
 		rc = llite_root ? PTR_ERR(llite_root) : -ENOMEM;
 		llite_root = NULL;
+free_kobj:
 		kobject_put(llite_kobj);
 		llite_kobj = NULL;
 	}
@@ -65,9 +87,6 @@ void llite_tunables_unregister(void)
 {
 	kobject_put(llite_kobj);
 	llite_kobj = NULL;
-
-	debugfs_remove(llite_root);
-	llite_root = NULL;
 }
 
 /* debugfs llite mount point registration */
@@ -1216,17 +1235,17 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file,
 	NULL,
 };
 
-static void llite_kobj_release(struct kobject *kobj)
+static void sbi_kobj_release(struct kobject *kobj)
 {
 	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
 					      ll_kset.kobj);
 	complete(&sbi->ll_kobj_unregister);
 }
 
-static struct kobj_type llite_ktype = {
+static struct kobj_type sbi_ktype = {
 	.default_attrs	= llite_attrs,
 	.sysfs_ops	= &lustre_sysfs_ops,
-	.release	= llite_kobj_release,
+	.release	= sbi_kobj_release,
 };
 
 static const struct llite_file_opcode {
@@ -1384,7 +1403,7 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
 out_ll_kset:
 	/* Yes we also register sysfs mount kset here as well */
 	sbi->ll_kset.kobj.parent = llite_kobj;
-	sbi->ll_kset.kobj.ktype = &llite_ktype;
+	sbi->ll_kset.kobj.ktype = &sbi_ktype;
 	init_completion(&sbi->ll_kobj_unregister);
 	err = kobject_set_name(&sbi->ll_kset.kobj, "%s", name);
 	if (err)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (4 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables() James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-22  5:20   ` NeilBrown
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type James Simmons
                   ` (22 subsequent siblings)
  28 siblings, 1 reply; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

As there is a 1-1 mapping between obd_types and their ->typ_kobj, it
is simple and more normal to embed the kobj in the obd_type, rather
than allocate it separately.

This requires calling "kobject_init_and_add()" earlier, so we
open-code relevant part of class_setup_tunables() in
class_register_type(). Now class_setup_tunables() is needed only
for server side code.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd.h             |  2 +-
 fs/lustre/include/obd_class.h       |  1 -
 fs/lustre/obdclass/genops.c         | 56 +++++++++++--------------------------
 fs/lustre/obdclass/lprocfs_status.c |  2 +-
 4 files changed, 19 insertions(+), 42 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index f626695..f20c246 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -107,7 +107,7 @@ struct obd_type {
 	int			 typ_refcnt;
 	struct lu_device_type	*typ_lu;
 	spinlock_t		 obd_type_lock;
-	struct kobject		*typ_kobj;
+	struct kobject		 typ_kobj;
 };
 
 struct brw_page {
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index e4cde19..1729865 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -60,7 +60,6 @@
 
 /* genops.c */
 struct obd_export *class_conn2export(struct lustre_handle *conn);
-struct kobject *class_setup_tunables(const char *name);
 int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 			const char *name, struct lu_device_type *ldt);
 int class_unregister_type(const char *name);
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index bc1f979..ccd8a42 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -136,7 +136,9 @@ void class_put_type(struct obd_type *type)
 
 static void class_sysfs_release(struct kobject *kobj)
 {
-	kfree(kobj);
+	struct obd_type *type = container_of(kobj, struct obd_type, typ_kobj);
+
+	kfree(type);
 }
 
 static struct kobj_type class_ktype = {
@@ -144,26 +146,6 @@ static void class_sysfs_release(struct kobject *kobj)
 	.release	= class_sysfs_release,
 };
 
-struct kobject *class_setup_tunables(const char *name)
-{
-	struct kobject *kobj;
-	int rc;
-
-	kobj = kzalloc(sizeof(*kobj), GFP_KERNEL);
-	if (!kobj)
-		return ERR_PTR(-ENOMEM);
-
-	kobj->kset = lustre_kset;
-	kobject_init(kobj, &class_ktype);
-	rc = kobject_add(kobj, &lustre_kset->kobj, "%s", name);
-	if (rc) {
-		kobject_put(kobj);
-		return ERR_PTR(rc);
-	}
-	return kobj;
-}
-EXPORT_SYMBOL(class_setup_tunables);
-
 #define CLASS_MAX_NAME 1024
 
 int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
@@ -181,10 +163,17 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 		return -EEXIST;
 	}
 
-	rc = -ENOMEM;
 	type = kzalloc(sizeof(*type), GFP_NOFS);
 	if (!type)
-		return rc;
+		return -ENOMEM;
+
+	type->typ_kobj.kset = lustre_kset;
+	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
+				  &lustre_kset->kobj, "%s", name);
+	if (rc)
+		goto failed;
+
+	type->typ_debugfs_entry = debugfs_create_dir(name, debugfs_lustre_root);
 
 	type->typ_dt_ops = kzalloc(sizeof(*type->typ_dt_ops), GFP_NOFS);
 	type->typ_md_ops = kzalloc(sizeof(*type->typ_md_ops), GFP_NOFS);
@@ -202,22 +191,11 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	strcpy(type->typ_name, name);
 	spin_lock_init(&type->obd_type_lock);
 
-	type->typ_debugfs_entry = debugfs_create_dir(type->typ_name,
-						     debugfs_lustre_root);
-
-	type->typ_kobj = class_setup_tunables(type->typ_name);
-	if (IS_ERR(type->typ_kobj)) {
-		rc = PTR_ERR(type->typ_kobj);
-		goto failed;
-	}
-
 	if (ldt) {
 		type->typ_lu = ldt;
 		rc = lu_device_type_init(ldt);
-		if (rc != 0) {
-			kobject_put(type->typ_kobj);
+		if (rc != 0)
 			goto failed;
-		}
 	}
 
 	spin_lock(&obd_types_lock);
@@ -230,7 +208,8 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	kfree(type->typ_name);
 	kfree(type->typ_md_ops);
 	kfree(type->typ_dt_ops);
-	kfree(type);
+	kobject_put(&type->typ_kobj);
+
 	return rc;
 }
 EXPORT_SYMBOL(class_register_type);
@@ -253,8 +232,6 @@ int class_unregister_type(const char *name)
 		return -EBUSY;
 	}
 
-	kobject_put(type->typ_kobj);
-
 	debugfs_remove_recursive(type->typ_debugfs_entry);
 
 	if (type->typ_lu)
@@ -266,7 +243,8 @@ int class_unregister_type(const char *name)
 	kfree(type->typ_name);
 	kfree(type->typ_dt_ops);
 	kfree(type->typ_md_ops);
-	kfree(type);
+	kobject_put(&type->typ_kobj);
+
 	return 0;
 } /* class_unregister_type */
 EXPORT_SYMBOL(class_unregister_type);
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 11fddc8..71bf409 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1036,7 +1036,7 @@ int lprocfs_obd_setup(struct obd_device *obd, bool uuid_only)
 	obd->obd_ktype.sysfs_ops = &lustre_sysfs_ops;
 	obd->obd_ktype.release = obd_sysfs_release;
 
-	obd->obd_kset.kobj.parent = obd->obd_type->typ_kobj;
+	obd->obd_kset.kobj.parent = &obd->obd_type->typ_kobj;
 	obd->obd_kset.kobj.ktype = &obd->obd_ktype;
 	init_completion(&obd->obd_kobj_unregister);
 	rc = kset_register(&obd->obd_kset);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (5 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-22  6:49   ` NeilBrown
  2019-05-22 18:51   ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 08/29] lustre: obd_type: use typ_kobj.name as typ_name James Simmons
                   ` (21 subsequent siblings)
  28 siblings, 2 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

Now that obj_type is managed as a kobject, move all
the freeing and deregistering into class_sysfs_release().
Only leave type->typ_lu handling since unloading obdecho
can trigger an assert.

lu_context_key_degister()) ASSERTION( atomic_read(&key->lct_used) >= 1 ) failed:
lu_context_key_degister()) LBUG
kernel: Pid: 10642, comm: rmmod
Call Trace:
[<ffffffffc096e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[<ffffffffc096e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[<ffffffffc0f9761c>] lu_context_key_degister+0x14c/0x160 [obdclass]
[<ffffffffc0f977d2>] lu_context_key_degister_many+0x72/0xb0 [obdclass]
[<ffffffffc13e7130>] echo_type_fini+0x20/0x30 [obdecho]
[<ffffffffc0f9618b>] lu_device_type_fini+0x1b/0x20 [obdclass]
[<ffffffffc0f67fce>] class_sysfs_release+0x3e/0x6b0 [obdclass]
[<ffffffffb85782a1>] kobject_release+0x81/0x1b0
[<ffffffffb8578138>] kobject_put+0x28/0x60
[<ffffffffc0f6940c>] class_unregister_type+0x23c/0x550 [obdclass]
[<ffffffffc13f9636>] obdecho_exit+0x10/0x9da [obdecho]

Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/genops.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index ccd8a42..2a5ec93 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -138,6 +138,15 @@ static void class_sysfs_release(struct kobject *kobj)
 {
 	struct obd_type *type = container_of(kobj, struct obd_type, typ_kobj);
 
+	debugfs_remove_recursive(type->typ_debugfs_entry);
+
+	spin_lock(&obd_types_lock);
+	list_del(&type->typ_chain);
+	spin_unlock(&obd_types_lock);
+
+	kfree(type->typ_name);
+	kfree(type->typ_md_ops);
+	kfree(type->typ_dt_ops);
 	kfree(type);
 }
 
@@ -167,6 +176,7 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	if (!type)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&type->typ_chain);
 	type->typ_kobj.kset = lustre_kset;
 	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
 				  &lustre_kset->kobj, "%s", name);
@@ -205,9 +215,6 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	return 0;
 
 failed:
-	kfree(type->typ_name);
-	kfree(type->typ_md_ops);
-	kfree(type->typ_dt_ops);
 	kobject_put(&type->typ_kobj);
 
 	return rc;
@@ -232,17 +239,9 @@ int class_unregister_type(const char *name)
 		return -EBUSY;
 	}
 
-	debugfs_remove_recursive(type->typ_debugfs_entry);
-
 	if (type->typ_lu)
 		lu_device_type_fini(type->typ_lu);
 
-	spin_lock(&obd_types_lock);
-	list_del(&type->typ_chain);
-	spin_unlock(&obd_types_lock);
-	kfree(type->typ_name);
-	kfree(type->typ_dt_ops);
-	kfree(type->typ_md_ops);
 	kobject_put(&type->typ_kobj);
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 08/29] lustre: obd_type: use typ_kobj.name as typ_name
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (6 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 09/29] lustre: obd_type: discard obd_types linked list James Simmons
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

As the kobject has a name (after kobject_add has been called),
we don't need to also store it in typ_name.
So use typ_kobj.name instead of typ_name.

This requires changing some "char *" to "const char *" as
typ.kobj.name is const.

Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_net.h |  4 ++--
 fs/lustre/include/obd.h        |  2 +-
 fs/lustre/include/obd_class.h  |  2 +-
 fs/lustre/ldlm/ldlm_lib.c      |  2 +-
 fs/lustre/obdclass/genops.c    | 18 +++++++-----------
 fs/lustre/ptlrpc/client.c      |  2 +-
 6 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h
index f6d1be1..1cf0f12 100644
--- a/fs/lustre/include/lustre_net.h
+++ b/fs/lustre/include/lustre_net.h
@@ -308,7 +308,7 @@ struct ptlrpc_client {
 	/** What portal do we expect replies on */
 	u32			cli_reply_portal;
 	/** Name of the client */
-	char			*cli_name;
+	const char		*cli_name;
 };
 
 /** state flags of requests */
@@ -1817,7 +1817,7 @@ static inline int ptlrpc_client_bulk_active(struct ptlrpc_request *req)
 int ptlrpc_inc_ref(void);
 void ptlrpc_dec_ref(void);
 
-void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
+void ptlrpc_init_client(int req_portal, int rep_portal, const char *name,
 			struct ptlrpc_client *);
 struct ptlrpc_connection *ptlrpc_uuid_to_connection(struct obd_uuid *uuid,
 						    lnet_nid_t nid4refnet);
diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index f20c246..6bf052a 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -103,12 +103,12 @@ struct obd_type {
 	struct obd_ops		*typ_dt_ops;
 	struct md_ops		*typ_md_ops;
 	struct dentry		*typ_debugfs_entry;
-	char			*typ_name;
 	int			 typ_refcnt;
 	struct lu_device_type	*typ_lu;
 	spinlock_t		 obd_type_lock;
 	struct kobject		 typ_kobj;
 };
+#define typ_name typ_kobj.name
 
 struct brw_page {
 	u64			off;
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 1729865..742cb9a4 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -74,7 +74,7 @@ struct obd_device *class_newdev(const char *type_name, const char *name,
 struct obd_device *class_name2obd(const char *name);
 int class_uuid2dev(struct obd_uuid *uuid);
 struct obd_device *class_find_client_obd(struct obd_uuid *tgt_uuid,
-					 const char *typ_name,
+					 const char *type_name,
 					 struct obd_uuid *grp_uuid);
 struct obd_device *class_devices_in_group(struct obd_uuid *grp_uuid,
 					  int *next);
diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index e0d2851..9c61b33 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -247,7 +247,7 @@ int client_obd_setup(struct obd_device *obddev, struct lustre_cfg *lcfg)
 	struct obd_import *imp;
 	struct obd_uuid server_uuid;
 	int rq_portal, rp_portal, connect_op;
-	char *name = obddev->obd_type->typ_name;
+	const char *name = obddev->obd_type->typ_name;
 	enum ldlm_ns_type ns_type = LDLM_NS_TYPE_UNKNOWN;
 	struct ptlrpc_connection fake_conn = {
 		.c_self = 0,
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 2a5ec93..7be779f 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -144,7 +144,6 @@ static void class_sysfs_release(struct kobject *kobj)
 	list_del(&type->typ_chain);
 	spin_unlock(&obd_types_lock);
 
-	kfree(type->typ_name);
 	kfree(type->typ_md_ops);
 	kfree(type->typ_dt_ops);
 	kfree(type);
@@ -187,18 +186,15 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 
 	type->typ_dt_ops = kzalloc(sizeof(*type->typ_dt_ops), GFP_NOFS);
 	type->typ_md_ops = kzalloc(sizeof(*type->typ_md_ops), GFP_NOFS);
-	type->typ_name = kzalloc(strlen(name) + 1, GFP_NOFS);
 
 	if (!type->typ_dt_ops ||
-	    !type->typ_md_ops ||
-	    !type->typ_name)
+	    !type->typ_md_ops)
 		goto failed;
 
 	*type->typ_dt_ops = *dt_ops;
 	/* md_ops is optional */
 	if (md_ops)
 		*type->typ_md_ops = *md_ops;
-	strcpy(type->typ_name, name);
 	spin_lock_init(&type->obd_type_lock);
 
 	if (ldt) {
@@ -530,7 +526,7 @@ struct obd_device *class_num2obd(int num)
  * otherwise any client connected to the tgt is returned.
  */
 struct obd_device *class_find_client_obd(struct obd_uuid *tgt_uuid,
-					 const char *typ_name,
+					 const char *type_name,
 					 struct obd_uuid *grp_uuid)
 {
 	int i;
@@ -541,8 +537,8 @@ struct obd_device *class_find_client_obd(struct obd_uuid *tgt_uuid,
 
 		if (!obd)
 			continue;
-		if ((strncmp(obd->obd_type->typ_name, typ_name,
-			     strlen(typ_name)) == 0)) {
+		if ((strncmp(obd->obd_type->typ_name, type_name,
+			     strlen(type_name)) == 0)) {
 			if (obd_uuid_equals(tgt_uuid,
 					    &obd->u.cli.cl_target_uuid) &&
 			    ((grp_uuid) ? obd_uuid_equals(grp_uuid,
@@ -1354,7 +1350,7 @@ u32 obd_get_max_rpcs_in_flight(struct client_obd *cli)
 int obd_set_max_rpcs_in_flight(struct client_obd *cli, u32 max)
 {
 	struct obd_request_slot_waiter *orsw;
-	const char *typ_name;
+	const char *type_name;
 	u32 old;
 	int diff;
 	int rc;
@@ -1363,8 +1359,8 @@ int obd_set_max_rpcs_in_flight(struct client_obd *cli, u32 max)
 	if (max > OBD_MAX_RIF_MAX || max < 1)
 		return -ERANGE;
 
-	typ_name = cli->cl_import->imp_obd->obd_type->typ_name;
-	if (!strcmp(typ_name, LUSTRE_MDC_NAME)) {
+	type_name = cli->cl_import->imp_obd->obd_type->typ_name;
+	if (!strcmp(type_name, LUSTRE_MDC_NAME)) {
 		/*
 		 * adjust max_mod_rpcs_in_flight to ensure it is always
 		 * strictly lower that max_rpcs_in_flight
diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index eb5d22a..da1ccd8 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -67,7 +67,7 @@
 /**
  * Initialize passed in client structure @cl.
  */
-void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
+void ptlrpc_init_client(int req_portal, int rep_portal, const char *name,
 			struct ptlrpc_client *cl)
 {
 	cl->cli_request_portal = req_portal;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 09/29] lustre: obd_type: discard obd_types linked list.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (7 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 08/29] lustre: obd_type: use typ_kobj.name as typ_name James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock James Simmons
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

As all obd_types are kobjects in the lustre_kset kset,
they are linked together in that kset and don't
need any extra linkage.

There are non-obd_type objects in lustre_kset, added by
class_setup_tunables(). These have a different ->ktype, so we are
careful to only return objects with the correct ->ktype.

As kset_find_obj() returns a counted reference, we need
to put that reference when done.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd.h     |  1 -
 fs/lustre/obdclass/genops.c | 42 ++++++++++++++++++++----------------------
 2 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 6bf052a..4c58b91 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -99,7 +99,6 @@ struct obd_info {
 };
 
 struct obd_type {
-	struct list_head	 typ_chain;
 	struct obd_ops		*typ_dt_ops;
 	struct md_ops		*typ_md_ops;
 	struct dentry		*typ_debugfs_entry;
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 7be779f..3f46298 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -42,8 +42,6 @@
 #include <lprocfs_status.h>
 #include <lustre_kernelcomm.h>
 
-static DEFINE_SPINLOCK(obd_types_lock);
-static LIST_HEAD(obd_types);
 DEFINE_RWLOCK(obd_dev_lock);
 static struct obd_device *obd_devs[MAX_OBD_DEVICES];
 
@@ -52,6 +50,7 @@
 EXPORT_SYMBOL(obdo_cachep);
 static struct kmem_cache *import_cachep;
 
+static struct kobj_type class_ktype;
 static struct workqueue_struct *zombie_wq;
 static void obd_zombie_export_add(struct obd_export *exp);
 static void obd_zombie_import_add(struct obd_import *imp);
@@ -88,23 +87,20 @@ static void obd_device_free(struct obd_device *obd)
 
 static struct obd_type *class_search_type(const char *name)
 {
-	struct obd_type *type;
+	struct kobject *kobj = kset_find_obj(lustre_kset, name);
 
-	spin_lock(&obd_types_lock);
-	list_for_each_entry(type, &obd_types, typ_chain) {
-		if (strcmp(type->typ_name, name) == 0) {
-			spin_unlock(&obd_types_lock);
-			return type;
-		}
-	}
-	spin_unlock(&obd_types_lock);
+	if (kobj && kobj->ktype == &class_ktype)
+		return container_of(kobj, struct obd_type, typ_kobj);
+
+	kobject_put(kobj);
 	return NULL;
 }
 
 static struct obd_type *class_get_type(const char *name)
 {
-	struct obd_type *type = class_search_type(name);
+	struct obd_type *type;
 
+	type = class_search_type(name);
 	if (!type) {
 		const char *modname = name;
 
@@ -121,6 +117,11 @@ static struct obd_type *class_get_type(const char *name)
 		type->typ_refcnt++;
 		try_module_get(type->typ_dt_ops->owner);
 		spin_unlock(&type->obd_type_lock);
+		/* class_search_type() returned a counted reference,
+		 * but we don't need that count any more as
+		 * we have one through typ_refcnt.
+		 */
+		kobject_put(&type->typ_kobj);
 	}
 	return type;
 }
@@ -140,10 +141,6 @@ static void class_sysfs_release(struct kobject *kobj)
 
 	debugfs_remove_recursive(type->typ_debugfs_entry);
 
-	spin_lock(&obd_types_lock);
-	list_del(&type->typ_chain);
-	spin_unlock(&obd_types_lock);
-
 	kfree(type->typ_md_ops);
 	kfree(type->typ_dt_ops);
 	kfree(type);
@@ -166,8 +163,10 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	/* sanity check */
 	LASSERT(strnlen(name, CLASS_MAX_NAME) < CLASS_MAX_NAME);
 
-	if (class_search_type(name)) {
+	type = class_search_type(name);
+	if (type) {
 		CDEBUG(D_IOCTL, "Type %s already registered\n", name);
+		kobject_put(&type->typ_kobj);
 		return -EEXIST;
 	}
 
@@ -175,7 +174,6 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	if (!type)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&type->typ_chain);
 	type->typ_kobj.kset = lustre_kset;
 	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
 				  &lustre_kset->kobj, "%s", name);
@@ -204,10 +202,6 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 			goto failed;
 	}
 
-	spin_lock(&obd_types_lock);
-	list_add(&type->typ_chain, &obd_types);
-	spin_unlock(&obd_types_lock);
-
 	return 0;
 
 failed:
@@ -232,12 +226,16 @@ int class_unregister_type(const char *name)
 		/* Remove ops, but leave the name for debugging */
 		kfree(type->typ_dt_ops);
 		kfree(type->typ_md_ops);
+		kobject_put(&type->typ_kobj);
 		return -EBUSY;
 	}
 
 	if (type->typ_lu)
 		lu_device_type_fini(type->typ_lu);
 
+	/* Put the ref returned by class_search_type() */
+	kobject_put(&type->typ_kobj);
+	/* Put the final ref */
 	kobject_put(&type->typ_kobj);
 
 	return 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (8 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 09/29] lustre: obd_type: discard obd_types linked list James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-22  6:53   ` NeilBrown
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 11/29] lustre: obdclass: don't copy ops structures in to new type James Simmons
                   ` (18 subsequent siblings)
  28 siblings, 1 reply; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

This lock is only used to protect typ_refcnt, so change
that to an atomic_t and discard the lock.

The lock also covers calls to try_module_get and module_put,
but this serves no purpose as it does not prevent the module
from being unloaded.

Finally, the return value for the call to try_module_get is
ignored, which is not safe.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd.h         |  3 +--
 fs/lustre/include/obd_class.h   |  1 -
 fs/lustre/mdc/mdc_request.c     |  2 +-
 fs/lustre/mgc/mgc_request.c     |  7 -------
 fs/lustre/obdclass/genops.c     | 29 ++++++++++++++---------------
 fs/lustre/obdclass/lu_object.c  |  2 +-
 fs/lustre/obdclass/obd_config.c | 19 -------------------
 7 files changed, 17 insertions(+), 46 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 4c58b91..61fb815 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -102,9 +102,8 @@ struct obd_type {
 	struct obd_ops		*typ_dt_ops;
 	struct md_ops		*typ_md_ops;
 	struct dentry		*typ_debugfs_entry;
-	int			 typ_refcnt;
+	atomic_t		 typ_refcnt;
 	struct lu_device_type	*typ_lu;
-	spinlock_t		 obd_type_lock;
 	struct kobject		 typ_kobj;
 };
 #define typ_name typ_kobj.name
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index 742cb9a4..a853ed5 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -210,7 +210,6 @@ struct lustre_profile {
 struct lustre_profile *class_get_profile(const char *prof);
 void class_del_profile(const char *prof);
 void class_put_profile(struct lustre_profile *lprof);
-void class_del_profiles(void);
 
 #if LUSTRE_TRACKS_LOCK_EXP_REFS
 
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index bc764f9..705a4e3 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -2542,7 +2542,7 @@ static int mdc_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize)
 static int mdc_precleanup(struct obd_device *obd)
 {
 	/* Failsafe, ok if racy */
-	if (obd->obd_type->typ_refcnt <= 1)
+	if (atomic_read(&obd->obd_type->typ_refcnt) <= 1)
 		libcfs_kkuc_group_rem(0, KUC_GRP_HSM);
 
 	mdc_changelog_cdev_finish(obd);
diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index 84ba6d0..d8be54d 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -712,13 +712,6 @@ static int mgc_precleanup(struct obd_device *obd)
 
 static int mgc_cleanup(struct obd_device *obd)
 {
-	/* COMPAT_146 - old config logs may have added profiles we don't
-	 * know about
-	 */
-	if (obd->obd_type->typ_refcnt <= 1)
-		/* Only for the last mgc */
-		class_del_profiles();
-
 	lprocfs_obd_cleanup(obd);
 	ptlrpcd_decref();
 
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 3f46298..9c9ad9c 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -113,15 +113,17 @@ static struct obd_type *class_get_type(const char *name)
 		}
 	}
 	if (type) {
-		spin_lock(&type->obd_type_lock);
-		type->typ_refcnt++;
-		try_module_get(type->typ_dt_ops->owner);
-		spin_unlock(&type->obd_type_lock);
-		/* class_search_type() returned a counted reference,
-		 * but we don't need that count any more as
-		 * we have one through typ_refcnt.
-		 */
-		kobject_put(&type->typ_kobj);
+		if (try_module_get(type->typ_dt_ops->owner)) {
+			atomic_inc(&type->typ_refcnt);
+			/* class_search_type() returned a counted reference,
+			 * but we don't need that count any more as
+			 * we have one through typ_refcnt.
+			 */
+			kobject_put(&type->typ_kobj);
+		} else {
+			kobject_put(&type->typ_kobj);
+			type = NULL;
+		}
 	}
 	return type;
 }
@@ -129,10 +131,8 @@ static struct obd_type *class_get_type(const char *name)
 void class_put_type(struct obd_type *type)
 {
 	LASSERT(type);
-	spin_lock(&type->obd_type_lock);
-	type->typ_refcnt--;
 	module_put(type->typ_dt_ops->owner);
-	spin_unlock(&type->obd_type_lock);
+	atomic_dec(&type->typ_refcnt);
 }
 
 static void class_sysfs_release(struct kobject *kobj)
@@ -193,7 +193,6 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 	/* md_ops is optional */
 	if (md_ops)
 		*type->typ_md_ops = *md_ops;
-	spin_lock_init(&type->obd_type_lock);
 
 	if (ldt) {
 		type->typ_lu = ldt;
@@ -220,8 +219,8 @@ int class_unregister_type(const char *name)
 		return -EINVAL;
 	}
 
-	if (type->typ_refcnt) {
-		CERROR("type %s has refcount (%d)\n", name, type->typ_refcnt);
+	if (atomic_read(&type->typ_refcnt)) {
+		CERROR("type %s has refcount (%d)\n", name, atomic_read(&type->typ_refcnt));
 		/* This is a bad situation, let's make the best of it */
 		/* Remove ops, but leave the name for debugging */
 		kfree(type->typ_dt_ops);
diff --git a/fs/lustre/obdclass/lu_object.c b/fs/lustre/obdclass/lu_object.c
index 9c872db..770cc1b 100644
--- a/fs/lustre/obdclass/lu_object.c
+++ b/fs/lustre/obdclass/lu_object.c
@@ -1267,7 +1267,7 @@ void lu_stack_fini(const struct lu_env *env, struct lu_device *top)
 		next = ldt->ldt_ops->ldto_device_free(env, scan);
 		type = ldt->ldt_obd_type;
 		if (type) {
-			type->typ_refcnt--;
+			atomic_dec(&type->typ_refcnt);
 			class_put_type(type);
 		}
 	}
diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 0cdadea4..0da69f6 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -743,25 +743,6 @@ void class_put_profile(struct lustre_profile *lprof)
 }
 EXPORT_SYMBOL(class_put_profile);
 
-/* COMPAT_146 */
-void class_del_profiles(void)
-{
-	struct lustre_profile *lprof, *n;
-
-	spin_lock(&lustre_profile_list_lock);
-	list_for_each_entry_safe(lprof, n, &lustre_profile_list, lp_list) {
-		list_del(&lprof->lp_list);
-		lprof->lp_list_deleted = true;
-		spin_unlock(&lustre_profile_list_lock);
-
-		class_put_profile(lprof);
-
-		spin_lock(&lustre_profile_list_lock);
-	}
-	spin_unlock(&lustre_profile_list_lock);
-}
-EXPORT_SYMBOL(class_del_profiles);
-
 /* We can't call lquota_process_config directly because
  * it lives in a module that must be loaded after this one.
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 11/29] lustre: obdclass: don't copy ops structures in to new type.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (9 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 12/29] lustre: obdclass: fix module load locking James Simmons
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

The obd_ops and md_ops structures passed to class_register_type() are
read-only, and have a lifetime that is exceeds the lifetime of the
obd_type - they are static in a module which unregisters the type before
being unloaded.

So there is no need to copy the ops, just store a pointer.

Also mark all the structures as read-only to confirm they don't get
written.  This is best-practice for structures of function pointers.

Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd.h         |  4 ++--
 fs/lustre/include/obd_class.h   |  3 ++-
 fs/lustre/lmv/lmv_obd.c         |  4 ++--
 fs/lustre/lov/lov_obd.c         |  2 +-
 fs/lustre/mdc/mdc_request.c     |  4 ++--
 fs/lustre/mgc/mgc_request.c     |  2 +-
 fs/lustre/obdclass/genops.c     | 22 ++++++----------------
 fs/lustre/obdecho/echo_client.c |  2 +-
 fs/lustre/osc/osc_request.c     |  2 +-
 9 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 61fb815..3bdde31 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -99,8 +99,8 @@ struct obd_info {
 };
 
 struct obd_type {
-	struct obd_ops		*typ_dt_ops;
-	struct md_ops		*typ_md_ops;
+	const struct obd_ops	*typ_dt_ops;
+	const struct md_ops	*typ_md_ops;
 	struct dentry		*typ_debugfs_entry;
 	atomic_t		 typ_refcnt;
 	struct lu_device_type	*typ_lu;
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index a853ed5..a3394b2 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -60,7 +60,8 @@
 
 /* genops.c */
 struct obd_export *class_conn2export(struct lustre_handle *conn);
-int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
+int class_register_type(const struct obd_ops *dt_ops,
+			const struct md_ops *md_ops,
 			const char *name, struct lu_device_type *ldt);
 int class_unregister_type(const char *name);
 
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index aae80c7..64ebaf3 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -3043,7 +3043,7 @@ static int lmv_merge_attr(struct obd_export *exp,
 	return 0;
 }
 
-static struct obd_ops lmv_obd_ops = {
+static const struct obd_ops lmv_obd_ops = {
 	.owner			= THIS_MODULE,
 	.setup			= lmv_setup,
 	.cleanup		= lmv_cleanup,
@@ -3060,7 +3060,7 @@ static int lmv_merge_attr(struct obd_export *exp,
 	.quotactl		= lmv_quotactl
 };
 
-static struct md_ops lmv_md_ops = {
+static const struct md_ops lmv_md_ops = {
 	.get_root		= lmv_get_root,
 	.null_inode		= lmv_null_inode,
 	.close			= lmv_close,
diff --git a/fs/lustre/lov/lov_obd.c b/fs/lustre/lov/lov_obd.c
index 203923f..652ed6a 100644
--- a/fs/lustre/lov/lov_obd.c
+++ b/fs/lustre/lov/lov_obd.c
@@ -1332,7 +1332,7 @@ static int lov_quotactl(struct obd_device *obd, struct obd_export *exp,
 	return rc;
 }
 
-static struct obd_ops lov_obd_ops = {
+static const struct obd_ops lov_obd_ops = {
 	.owner		= THIS_MODULE,
 	.setup		= lov_setup,
 	.cleanup	= lov_cleanup,
diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 705a4e3..6f933b5 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -2571,7 +2571,7 @@ static int mdc_process_config(struct obd_device *obd, u32 len, void *buf)
 	return count > 0 ? 0 : count;
 }
 
-static struct obd_ops mdc_obd_ops = {
+static const struct obd_ops mdc_obd_ops = {
 	.owner			= THIS_MODULE,
 	.setup			= mdc_setup,
 	.precleanup		= mdc_precleanup,
@@ -2593,7 +2593,7 @@ static int mdc_process_config(struct obd_device *obd, u32 len, void *buf)
 	.quotactl		= mdc_quotactl,
 };
 
-static struct md_ops mdc_md_ops = {
+static const struct md_ops mdc_md_ops = {
 	.get_root		= mdc_get_root,
 	.null_inode		= mdc_null_inode,
 	.close			= mdc_close,
diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index d8be54d..1eca81a 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -1810,7 +1810,7 @@ static int mgc_process_config(struct obd_device *obd, u32 len, void *buf)
 	return rc;
 }
 
-static struct obd_ops mgc_obd_ops = {
+static const struct obd_ops mgc_obd_ops = {
 	.owner		= THIS_MODULE,
 	.setup		= mgc_setup,
 	.precleanup	= mgc_precleanup,
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 9c9ad9c..6d33414 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -141,8 +141,6 @@ static void class_sysfs_release(struct kobject *kobj)
 
 	debugfs_remove_recursive(type->typ_debugfs_entry);
 
-	kfree(type->typ_md_ops);
-	kfree(type->typ_dt_ops);
 	kfree(type);
 }
 
@@ -153,7 +151,8 @@ static void class_sysfs_release(struct kobject *kobj)
 
 #define CLASS_MAX_NAME 1024
 
-int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
+int class_register_type(const struct obd_ops *dt_ops,
+			const struct md_ops *md_ops,
 			const char *name,
 			struct lu_device_type *ldt)
 {
@@ -182,17 +181,8 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
 
 	type->typ_debugfs_entry = debugfs_create_dir(name, debugfs_lustre_root);
 
-	type->typ_dt_ops = kzalloc(sizeof(*type->typ_dt_ops), GFP_NOFS);
-	type->typ_md_ops = kzalloc(sizeof(*type->typ_md_ops), GFP_NOFS);
-
-	if (!type->typ_dt_ops ||
-	    !type->typ_md_ops)
-		goto failed;
-
-	*type->typ_dt_ops = *dt_ops;
-	/* md_ops is optional */
-	if (md_ops)
-		*type->typ_md_ops = *md_ops;
+	type->typ_dt_ops = dt_ops;
+	type->typ_md_ops = md_ops;
 
 	if (ldt) {
 		type->typ_lu = ldt;
@@ -223,8 +213,8 @@ int class_unregister_type(const char *name)
 		CERROR("type %s has refcount (%d)\n", name, atomic_read(&type->typ_refcnt));
 		/* This is a bad situation, let's make the best of it */
 		/* Remove ops, but leave the name for debugging */
-		kfree(type->typ_dt_ops);
-		kfree(type->typ_md_ops);
+		type->typ_dt_ops = NULL;
+		type->typ_md_ops = NULL;
 		kobject_put(&type->typ_kobj);
 		return -EBUSY;
 	}
diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index 317123f..baf34c8 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -1676,7 +1676,7 @@ static int echo_client_disconnect(struct obd_export *exp)
 	return rc;
 }
 
-static struct obd_ops echo_client_obd_ops = {
+static const struct obd_ops echo_client_obd_ops = {
 	.owner			= THIS_MODULE,
 	.iocontrol		= echo_client_iocontrol,
 	.connect		= echo_client_connect,
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 104bf43..9484b9a 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -2988,7 +2988,7 @@ static int osc_process_config(struct obd_device *obd, u32 len, void *buf)
 	return osc_process_config_base(obd, buf);
 }
 
-static struct obd_ops osc_obd_ops = {
+static const struct obd_ops osc_obd_ops = {
 	.owner		= THIS_MODULE,
 	.setup		= osc_setup,
 	.precleanup	= osc_precleanup,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 12/29] lustre: obdclass: fix module load locking.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (10 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 11/29] lustre: obdclass: don't copy ops structures in to new type James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 13/29] lustre: convert rsi_sem to a spinlock James Simmons
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

Safe module loading requires that we try_module_get() in a context
where the module cannot be unloaded, typically protected by
a spinlock that module-unload has to take.
This doesn't currently happen in class_get_type().

As free_module() calls synchronize_rcu() between calling the
exit function and freeing the module, we can use rcu_read_lock()
to check if the exit function has been called, and try_module_get()
if it hasn't.

We must also check the return status of try_module_get().

Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/genops.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 6d33414..00181e3 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -100,20 +100,31 @@ static struct obd_type *class_get_type(const char *name)
 {
 	struct obd_type *type;
 
+	rcu_read_lock();
 	type = class_search_type(name);
 	if (!type) {
 		const char *modname = name;
 
+		rcu_read_unlock();
 		if (!request_module("%s", modname)) {
 			CDEBUG(D_INFO, "Loaded module '%s'\n", modname);
-			type = class_search_type(name);
 		} else {
 			LCONSOLE_ERROR_MSG(0x158, "Can't load module '%s'\n",
 					   modname);
 		}
+		rcu_read_lock();
+		type = class_search_type(name);
 	}
 	if (type) {
-		if (try_module_get(type->typ_dt_ops->owner)) {
+		/*
+		 * Holding rcu_read_lock() matches the synchronize_rcu() call
+		 * in free_module() and ensures that if type->typ_dt_ops is
+		 * not yet NULL, then the module won't be freed until after
+		 * we rcu_read_unlock().
+		 */
+		const struct obd_ops *dt_ops = READ_ONCE(type->typ_dt_ops);
+
+		if (dt_ops && try_module_get(dt_ops->owner)) {
 			atomic_inc(&type->typ_refcnt);
 			/* class_search_type() returned a counted reference,
 			 * but we don't need that count any more as
@@ -125,6 +136,7 @@ static struct obd_type *class_get_type(const char *name)
 			type = NULL;
 		}
 	}
+	rcu_read_unlock();
 	return type;
 }
 
@@ -209,11 +221,18 @@ int class_unregister_type(const char *name)
 		return -EINVAL;
 	}
 
+	/*
+	 * Ensure that class_get_type doesn't try to get the module
+	 * as it could be freed before the obd_type is released.
+	 * synchronize_rcu() will be called before the module
+	 * is freed.
+	 */
+	type->typ_dt_ops = NULL;
+
 	if (atomic_read(&type->typ_refcnt)) {
 		CERROR("type %s has refcount (%d)\n", name, atomic_read(&type->typ_refcnt));
 		/* This is a bad situation, let's make the best of it */
 		/* Remove ops, but leave the name for debugging */
-		type->typ_dt_ops = NULL;
 		type->typ_md_ops = NULL;
 		kobject_put(&type->typ_kobj);
 		return -EBUSY;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 13/29] lustre: convert rsi_sem to a spinlock.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (11 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 12/29] lustre: obdclass: fix module load locking James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 14/29] lustre: ldlm: discard varname in ldlm_pool James Simmons
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

This lock is never held over code that sleeps, and is
only ever held for short periods of time.
So a simple spinlock is best.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd_class.h       | 3 +--
 fs/lustre/llite/llite_lib.c         | 6 +++---
 fs/lustre/llite/lproc_llite.c       | 4 ++--
 fs/lustre/obdclass/lprocfs_status.c | 8 ++++----
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index a3394b2..f78694c 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1688,12 +1688,11 @@ static inline void class_uuid_unparse(class_uuid_t uu, struct obd_uuid *out)
 void statfs_unpack(struct kstatfs *sfs, struct obd_statfs *osfs);
 
 /* root squash info */
-struct rw_semaphore;
 struct root_squash_info {
 	uid_t			rsi_uid;
 	gid_t			rsi_gid;
 	struct list_head	rsi_nosquash_nids;
-	struct rw_semaphore	rsi_sem;
+	spinlock_t		rsi_lock;	/* protects rsi_nosquash_nids */
 };
 
 /* linux-module.c */
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index ab7c84a..7d9a76e 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -130,7 +130,7 @@ static struct ll_sb_info *ll_init_sbi(void)
 	sbi->ll_squash.rsi_uid = 0;
 	sbi->ll_squash.rsi_gid = 0;
 	INIT_LIST_HEAD(&sbi->ll_squash.rsi_nosquash_nids);
-	init_rwsem(&sbi->ll_squash.rsi_sem);
+	spin_lock_init(&sbi->ll_squash.rsi_lock);
 
 	return sbi;
 }
@@ -2577,7 +2577,7 @@ void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
 	int i;
 
 	/* Update norootsquash flag */
-	down_write(&squash->rsi_sem);
+	spin_lock(&squash->rsi_lock);
 	if (list_empty(&squash->rsi_nosquash_nids)) {
 		spin_lock(&sbi->ll_lock);
 		sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
@@ -2605,7 +2605,7 @@ void ll_compute_rootsquash_state(struct ll_sb_info *sbi)
 			sbi->ll_flags &= ~LL_SBI_NOROOTSQUASH;
 		spin_unlock(&sbi->ll_lock);
 	}
-	up_write(&squash->rsi_sem);
+	spin_unlock(&squash->rsi_lock);
 }
 
 /**
diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
index 91b0a50..8f89f3b 100644
--- a/fs/lustre/llite/lproc_llite.c
+++ b/fs/lustre/llite/lproc_llite.c
@@ -1152,7 +1152,7 @@ static int ll_nosquash_nids_seq_show(struct seq_file *m, void *v)
 	struct root_squash_info *squash = &sbi->ll_squash;
 	int len;
 
-	down_read(&squash->rsi_sem);
+	spin_lock(&squash->rsi_lock);
 	if (!list_empty(&squash->rsi_nosquash_nids)) {
 		len = cfs_print_nidlist(m->buf + m->count, m->size - m->count,
 					&squash->rsi_nosquash_nids);
@@ -1161,7 +1161,7 @@ static int ll_nosquash_nids_seq_show(struct seq_file *m, void *v)
 	} else {
 		seq_puts(m, "NONE\n");
 	}
-	up_read(&squash->rsi_sem);
+	spin_unlock(&squash->rsi_lock);
 
 	return 0;
 }
diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 71bf409..29c6b5d 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -1751,10 +1751,10 @@ int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count,
 	if ((len == 4 && !strncmp(kernbuf, "NONE", len)) ||
 	    (len == 5 && !strncmp(kernbuf, "clear", len))) {
 		/* empty string is special case */
-		down_write(&squash->rsi_sem);
+		spin_lock(&squash->rsi_lock);
 		if (!list_empty(&squash->rsi_nosquash_nids))
 			cfs_free_nidlist(&squash->rsi_nosquash_nids);
-		up_write(&squash->rsi_sem);
+		spin_unlock(&squash->rsi_lock);
 		LCONSOLE_INFO("%s: nosquash_nids is cleared\n", name);
 		kfree(kernbuf);
 		return count;
@@ -1771,11 +1771,11 @@ int lprocfs_wr_nosquash_nids(const char __user *buffer, unsigned long count,
 	kfree(kernbuf);
 	kernbuf = NULL;
 
-	down_write(&squash->rsi_sem);
+	spin_lock(&squash->rsi_lock);
 	if (!list_empty(&squash->rsi_nosquash_nids))
 		cfs_free_nidlist(&squash->rsi_nosquash_nids);
 	list_splice(&tmp, &squash->rsi_nosquash_nids);
-	up_write(&squash->rsi_sem);
+	spin_unlock(&squash->rsi_lock);
 
 	return count;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 14/29] lustre: ldlm: discard varname in ldlm_pool.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (12 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 13/29] lustre: convert rsi_sem to a spinlock James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 15/29] lustre: lprocfs: use log2.h macros instead of shift loop James Simmons
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

This allocated buffer serves no purpose.
A constant string is copied into it, it is passed to some
function which copies it out again, then the buffer is freed.
Instead, we can pass the constant string to that function.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/ldlm/ldlm_internal.h |  2 --
 fs/lustre/ldlm/ldlm_pool.c     | 18 +++++-------------
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_internal.h b/fs/lustre/ldlm/ldlm_internal.h
index d8dcf8a..c907d88 100644
--- a/fs/lustre/ldlm/ldlm_internal.h
+++ b/fs/lustre/ldlm/ldlm_internal.h
@@ -31,8 +31,6 @@
  * Lustre is a trademark of Sun Microsystems, Inc.
  */
 
-#define MAX_STRING_SIZE 128
-
 extern int ldlm_srv_namespace_nr;
 extern int ldlm_cli_namespace_nr;
 extern struct mutex ldlm_srv_namespace_lock;
diff --git a/fs/lustre/ldlm/ldlm_pool.c b/fs/lustre/ldlm/ldlm_pool.c
index 1f81795..a984b5d 100644
--- a/fs/lustre/ldlm/ldlm_pool.c
+++ b/fs/lustre/ldlm/ldlm_pool.c
@@ -503,9 +503,9 @@ static ssize_t grant_speed_show(struct kobject *kobj, struct attribute *attr,
 LDLM_POOL_SYSFS_WRITER_NOLOCK_STORE(lock_volume_factor, atomic);
 LUSTRE_RW_ATTR(lock_volume_factor);
 
-#define LDLM_POOL_ADD_VAR(name, var, ops)			\
+#define LDLM_POOL_ADD_VAR(_name, var, ops)			\
 	do {							\
-		snprintf(var_name, MAX_STRING_SIZE, #name);	\
+		pool_vars[0].name = #_name;			\
 		pool_vars[0].data = var;			\
 		pool_vars[0].fops = ops;			\
 		ldebugfs_add_vars(pl->pl_debugfs_entry, pool_vars, NULL);\
@@ -557,25 +557,18 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl)
 						 ns_pool);
 	struct dentry *debugfs_ns_parent;
 	struct lprocfs_vars pool_vars[2];
-	char *var_name = NULL;
 	int rc = 0;
 
-	var_name = kzalloc(MAX_STRING_SIZE + 1, GFP_NOFS);
-	if (!var_name)
-		return -ENOMEM;
-
 	debugfs_ns_parent = ns->ns_debugfs_entry;
 	if (IS_ERR_OR_NULL(debugfs_ns_parent)) {
 		CERROR("%s: debugfs entry is not initialized\n",
 		       ldlm_ns_name(ns));
 		rc = -EINVAL;
-		goto out_free_name;
+		goto out;
 	}
 	pl->pl_debugfs_entry = debugfs_create_dir("pool", debugfs_ns_parent);
 
-	var_name[MAX_STRING_SIZE] = '\0';
 	memset(pool_vars, 0, sizeof(pool_vars));
-	pool_vars[0].name = var_name;
 
 	LDLM_POOL_ADD_VAR(state, pl, &lprocfs_pool_state_fops);
 
@@ -583,7 +576,7 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl)
 					   LDLM_POOL_FIRST_STAT, 0);
 	if (!pl->pl_stats) {
 		rc = -ENOMEM;
-		goto out_free_name;
+		goto out;
 	}
 
 	lprocfs_counter_init(pl->pl_stats, LDLM_POOL_GRANTED_STAT,
@@ -622,8 +615,7 @@ static int ldlm_pool_debugfs_init(struct ldlm_pool *pl)
 	debugfs_create_file("stats", 0644, pl->pl_debugfs_entry, pl->pl_stats,
 			    &lprocfs_stats_seq_fops);
 
-out_free_name:
-	kfree(var_name);
+out:
 	return rc;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 15/29] lustre: lprocfs: use log2.h macros instead of shift loop.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (13 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 14/29] lustre: ldlm: discard varname in ldlm_pool James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 16/29] lustre: handles: discard h_owner in favour of h_ops James Simmons
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

These shift loops seem to be trying to avoid doing a
multiplication.
The same effect can be achieved more transparently using
rounddown_pow_of_two().  Even though there is a multiplication
in the C code, the resulting machine code just does a single shift.

As rounddown_pow_of_two() is not defined for 0, and as we cannot be
positively use the blk_size is non-zero, use blk_size ?: 1.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/lprocfs_status.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
index 29c6b5d..2af9dad 100644
--- a/fs/lustre/obdclass/lprocfs_status.c
+++ b/fs/lustre/obdclass/lprocfs_status.c
@@ -374,9 +374,7 @@ static ssize_t kbytestotal_show(struct kobject *kobj, struct attribute *attr,
 		u32 blk_size = osfs.os_bsize >> 10;
 		u64 result = osfs.os_blocks;
 
-		while (blk_size >>= 1)
-			result <<= 1;
-
+		result *= rounddown_pow_of_two(blk_size ?: 1);
 		return sprintf(buf, "%llu\n", result);
 	}
 
@@ -397,8 +395,7 @@ static ssize_t kbytesfree_show(struct kobject *kobj, struct attribute *attr,
 		u32 blk_size = osfs.os_bsize >> 10;
 		u64 result = osfs.os_bfree;
 
-		while (blk_size >>= 1)
-			result <<= 1;
+		result *= rounddown_pow_of_two(blk_size ?: 1);
 
 		return sprintf(buf, "%llu\n", result);
 	}
@@ -420,8 +417,7 @@ static ssize_t kbytesavail_show(struct kobject *kobj, struct attribute *attr,
 		u32 blk_size = osfs.os_bsize >> 10;
 		u64 result = osfs.os_bavail;
 
-		while (blk_size >>= 1)
-			result <<= 1;
+		result *= rounddown_pow_of_two(blk_size ?: 1);
 
 		return sprintf(buf, "%llu\n", result);
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 16/29] lustre: handles: discard h_owner in favour of h_ops
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (14 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 15/29] lustre: lprocfs: use log2.h macros instead of shift loop James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 17/29] lustre: handle: move refcount into the lustre_handle James Simmons
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

lustre_handles  assigned a  64bit  unique identifier  (a 'cookie')  to
objects of  various types and  stored them  in a hash  table, allowing
them to be accessed by the cookie.

There is a facility for type checking by recording an 'owner' for each
object, and checking the owner on lookup.  Unfortunately this is not
used - owner is always zero.

Each object also contains an h_ops pointer which can be used to
reliably identify an owner.

So discard h_owner, pass and 'ops' pointer to class_handle2object(),
and only return objects for which the h_ops matches.

Note that server code uses h_owner slightly differently - it
identifies not only the type but also the "struct mdt_export_data"
that the cookie is associated with.
This can be changed to store the mdt_export_data  pointer in
the mdt_file_data, and  check it to validate the candidate returned by
class_handle2object() finds a candidate.

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_handles.h  | 7 +++----
 fs/lustre/ldlm/ldlm_lock.c          | 2 +-
 fs/lustre/obdclass/genops.c         | 3 ++-
 fs/lustre/obdclass/lustre_handles.c | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index 6836808..9a4b1a8 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -65,8 +65,7 @@ struct portals_handle_ops {
 struct portals_handle {
 	struct list_head		h_link;
 	u64				h_cookie;
-	const void			*h_owner;
-	struct portals_handle_ops	*h_ops;
+	const struct portals_handle_ops	*h_ops;
 
 	/* newly added fields to handle the RCU issue. -jxiong */
 	struct rcu_head			h_rcu;
@@ -79,9 +78,9 @@ struct portals_handle {
 
 /* Add a handle to the hash table */
 void class_handle_hash(struct portals_handle *,
-		       struct portals_handle_ops *ops);
+		       const struct portals_handle_ops *ops);
 void class_handle_unhash(struct portals_handle *);
-void *class_handle2object(u64 cookie, const void *owner);
+void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops);
 void class_handle_free_cb(struct rcu_head *rcu);
 int class_handle_init(void);
 void class_handle_cleanup(void);
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 7ec5fc9..768cccc 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -515,7 +515,7 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle,
 
 	LASSERT(handle);
 
-	lock = class_handle2object(handle->cookie, NULL);
+	lock = class_handle2object(handle->cookie, &lock_handle_ops);
 	if (!lock)
 		return NULL;
 
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 00181e3..f1fb64b 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -677,6 +677,7 @@ int obd_init_caches(void)
 	return -ENOMEM;
 }
 
+static struct portals_handle_ops export_handle_ops;
 /* map connection to client */
 struct obd_export *class_conn2export(struct lustre_handle *conn)
 {
@@ -693,7 +694,7 @@ struct obd_export *class_conn2export(struct lustre_handle *conn)
 	}
 
 	CDEBUG(D_INFO, "looking for export cookie %#llx\n", conn->cookie);
-	export = class_handle2object(conn->cookie, NULL);
+	export = class_handle2object(conn->cookie, &export_handle_ops);
 	return export;
 }
 EXPORT_SYMBOL(class_conn2export);
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 0674afb..32b70d6 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -59,7 +59,7 @@
  * global (per-node) hash-table.
  */
 void class_handle_hash(struct portals_handle *h,
-		       struct portals_handle_ops *ops)
+		       const struct portals_handle_ops *ops)
 {
 	struct handle_bucket *bucket;
 
@@ -132,7 +132,7 @@ void class_handle_unhash(struct portals_handle *h)
 }
 EXPORT_SYMBOL(class_handle_unhash);
 
-void *class_handle2object(u64 cookie, const void *owner)
+void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops)
 {
 	struct handle_bucket *bucket;
 	struct portals_handle *h;
@@ -147,7 +147,7 @@ void *class_handle2object(u64 cookie, const void *owner)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(h, &bucket->head, h_link) {
-		if (h->h_cookie != cookie || h->h_owner != owner)
+		if (h->h_cookie != cookie || h->h_ops != ops)
 			continue;
 
 		spin_lock(&h->h_lock);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 17/29] lustre: handle: move refcount into the lustre_handle.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (15 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 16/29] lustre: handles: discard h_owner in favour of h_ops James Simmons
@ 2019-05-20 12:50 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 18/29] lustre: discard OBD_FREE_RCU James Simmons
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:50 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

Every object with a lustre_handle has (and must have) a refcount.
The lustre_handles code needs a call-out to increment this.
To simplify things, move the refcount into the lustre_handle
and discard the call-out.

In order to preserve the same debug messages, we store an object type
name in the portals_handle_ops, and use that in a CDEBUG() when
incrementing the ref count.

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_dlm.h      |  6 -----
 fs/lustre/include/lustre_export.h   |  1 -
 fs/lustre/include/lustre_handles.h  |  4 ++-
 fs/lustre/include/lustre_import.h   |  2 --
 fs/lustre/ldlm/ldlm_lock.c          | 36 +++++++++++---------------
 fs/lustre/obdclass/genops.c         | 50 +++++++++++++++----------------------
 fs/lustre/obdclass/lustre_handles.c |  5 +++-
 fs/lustre/obdecho/echo_client.c     |  2 +-
 fs/lustre/ptlrpc/service.c          |  4 +--
 9 files changed, 45 insertions(+), 65 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index 1bd5119..fd9b0f8 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -591,12 +591,6 @@ struct ldlm_lock {
 	 */
 	struct portals_handle		l_handle;
 	/**
-	 * Lock reference count.
-	 * This is how many users have pointers to actual structure, so that
-	 * we do not accidentally free lock structure that is in use.
-	 */
-	atomic_t			l_refc;
-	/**
 	 * Internal spinlock protects l_resource.  We should hold this lock
 	 * first before taking res_lock.
 	 */
diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h
index fb34e0b..d053233 100644
--- a/fs/lustre/include/lustre_export.h
+++ b/fs/lustre/include/lustre_export.h
@@ -67,7 +67,6 @@ struct obd_export {
 	 * what export they are talking to.
 	 */
 	struct portals_handle		exp_handle;
-	refcount_t			exp_refcount;
 	/**
 	 * Set of counters below is to track where export references are
 	 * kept. The exp_rpc_count is used for reconnect handling also,
diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index 9a4b1a8..be5d41b 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -46,8 +46,9 @@
 #include <linux/types.h>
 
 struct portals_handle_ops {
-	void (*hop_addref)(void *object);
 	void (*hop_free)(void *object, int size);
+	/* hop_type is used for some debugging messages */
+	char *hop_type;
 };
 
 /* These handles are most easily used by having them appear at the very top of
@@ -66,6 +67,7 @@ struct portals_handle {
 	struct list_head		h_link;
 	u64				h_cookie;
 	const struct portals_handle_ops	*h_ops;
+	refcount_t			h_ref;
 
 	/* newly added fields to handle the RCU issue. -jxiong */
 	struct rcu_head			h_rcu;
diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h
index fc1f87c..6c830da 100644
--- a/fs/lustre/include/lustre_import.h
+++ b/fs/lustre/include/lustre_import.h
@@ -154,8 +154,6 @@ struct import_state_hist {
 struct obd_import {
 	/** Local handle (== id) for this import. */
 	struct portals_handle		imp_handle;
-	/** Reference counter */
-	atomic_t			imp_refcount;
 	struct lustre_handle		imp_dlm_handle; /* client's ldlm export */
 	/** Currently active connection */
 	struct ptlrpc_connection       *imp_connection;
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 768cccc..ecd3f4a 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -148,7 +148,7 @@ const char *ldlm_it2str(enum ldlm_intent_flags it)
  */
 struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock)
 {
-	atomic_inc(&lock->l_refc);
+	refcount_inc(&lock->l_handle.h_ref);
 	return lock;
 }
 EXPORT_SYMBOL(ldlm_lock_get);
@@ -161,8 +161,8 @@ struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock)
 void ldlm_lock_put(struct ldlm_lock *lock)
 {
 	LASSERT(lock->l_resource != LP_POISON);
-	LASSERT(atomic_read(&lock->l_refc) > 0);
-	if (atomic_dec_and_test(&lock->l_refc)) {
+	LASSERT(refcount_read(&lock->l_handle.h_ref) > 0);
+	if (refcount_dec_and_test(&lock->l_handle.h_ref)) {
 		struct ldlm_resource *res;
 
 		LDLM_DEBUG(lock,
@@ -356,12 +356,6 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock)
 	}
 }
 
-/* this is called by portals_handle2object with the handle lock taken */
-static void lock_handle_addref(void *lock)
-{
-	LDLM_LOCK_GET((struct ldlm_lock *)lock);
-}
-
 static void lock_handle_free(void *lock, int size)
 {
 	LASSERT(size == sizeof(struct ldlm_lock));
@@ -369,8 +363,8 @@ static void lock_handle_free(void *lock, int size)
 }
 
 static struct portals_handle_ops lock_handle_ops = {
-	.hop_addref = lock_handle_addref,
 	.hop_free   = lock_handle_free,
+	.hop_type   = "ldlm",
 };
 
 /**
@@ -395,7 +389,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource)
 	lock->l_resource = resource;
 	lu_ref_add(&resource->lr_reference, "lock", lock);
 
-	atomic_set(&lock->l_refc, 2);
+	refcount_set(&lock->l_handle.h_ref, 2);
 	INIT_LIST_HEAD(&lock->l_res_link);
 	INIT_LIST_HEAD(&lock->l_lru);
 	INIT_LIST_HEAD(&lock->l_pending_chain);
@@ -1996,13 +1990,13 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 &vaf,
 				 lock,
 				 lock->l_handle.h_cookie,
-				 atomic_read(&lock->l_refc),
+				 refcount_read(&lock->l_handle.h_ref),
 				 lock->l_readers, lock->l_writers,
 				 ldlm_lockname[lock->l_granted_mode],
 				 ldlm_lockname[lock->l_req_mode],
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
-				 exp ? refcount_read(&exp->exp_refcount) : -99,
+				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
 				 lock->l_pid, lock->l_callback_timeout,
 				 lock->l_lvb_type);
 		va_end(args);
@@ -2016,7 +2010,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock), lock,
 				 lock->l_handle.h_cookie,
-				 atomic_read(&lock->l_refc),
+				 refcount_read(&lock->l_handle.h_ref),
 				 lock->l_readers, lock->l_writers,
 				 ldlm_lockname[lock->l_granted_mode],
 				 ldlm_lockname[lock->l_req_mode],
@@ -2029,7 +2023,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 lock->l_req_extent.end,
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
-				 exp ? refcount_read(&exp->exp_refcount) : -99,
+				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
 				 lock->l_pid, lock->l_callback_timeout,
 				 lock->l_lvb_type);
 		break;
@@ -2040,7 +2034,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock), lock,
 				 lock->l_handle.h_cookie,
-				 atomic_read(&lock->l_refc),
+				 refcount_read(&lock->l_handle.h_ref),
 				 lock->l_readers, lock->l_writers,
 				 ldlm_lockname[lock->l_granted_mode],
 				 ldlm_lockname[lock->l_req_mode],
@@ -2052,7 +2046,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 lock->l_policy_data.l_flock.end,
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
-				 exp ? refcount_read(&exp->exp_refcount) : -99,
+				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
 				 lock->l_pid, lock->l_callback_timeout);
 		break;
 
@@ -2062,7 +2056,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock),
 				 lock, lock->l_handle.h_cookie,
-				 atomic_read(&lock->l_refc),
+				 refcount_read(&lock->l_handle.h_ref),
 				 lock->l_readers, lock->l_writers,
 				 ldlm_lockname[lock->l_granted_mode],
 				 ldlm_lockname[lock->l_req_mode],
@@ -2072,7 +2066,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 ldlm_typename[resource->lr_type],
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
-				 exp ? refcount_read(&exp->exp_refcount) : -99,
+				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
 				 lock->l_pid, lock->l_callback_timeout,
 				 lock->l_lvb_type);
 		break;
@@ -2083,7 +2077,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 &vaf,
 				 ldlm_lock_to_ns_name(lock),
 				 lock, lock->l_handle.h_cookie,
-				 atomic_read(&lock->l_refc),
+				 refcount_read(&lock->l_handle.h_ref),
 				 lock->l_readers, lock->l_writers,
 				 ldlm_lockname[lock->l_granted_mode],
 				 ldlm_lockname[lock->l_req_mode],
@@ -2092,7 +2086,7 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 				 ldlm_typename[resource->lr_type],
 				 lock->l_flags, nid,
 				 lock->l_remote_handle.cookie,
-				 exp ? refcount_read(&exp->exp_refcount) : -99,
+				 exp ? refcount_read(&exp->exp_handle.h_ref) : -99,
 				 lock->l_pid, lock->l_callback_timeout,
 				 lock->l_lvb_type);
 		break;
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index f1fb64b..c6dc3ba 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -722,7 +722,7 @@ static void class_export_destroy(struct obd_export *exp)
 {
 	struct obd_device *obd = exp->exp_obd;
 
-	LASSERT(refcount_read(&exp->exp_refcount) == 0);
+	LASSERT(refcount_read(&exp->exp_handle.h_ref) == 0);
 	LASSERT(obd);
 
 	CDEBUG(D_IOCTL, "destroying export %p/%s for %s\n", exp,
@@ -746,33 +746,28 @@ static void class_export_destroy(struct obd_export *exp)
 	OBD_FREE_RCU(exp, sizeof(*exp), &exp->exp_handle);
 }
 
-static void export_handle_addref(void *export)
-{
-	class_export_get(export);
-}
-
 static struct portals_handle_ops export_handle_ops = {
-	.hop_addref	= export_handle_addref,
 	.hop_free	= NULL,
+	.hop_type	= "export",
 };
 
 struct obd_export *class_export_get(struct obd_export *exp)
 {
-	refcount_inc(&exp->exp_refcount);
-	CDEBUG(D_INFO, "GETting export %p : new refcount %d\n", exp,
-	       refcount_read(&exp->exp_refcount));
+	refcount_inc(&exp->exp_handle.h_ref);
+	CDEBUG(D_INFO, "GET export %p refcount=%d\n", exp,
+	       refcount_read(&exp->exp_handle.h_ref));
 	return exp;
 }
 EXPORT_SYMBOL(class_export_get);
 
 void class_export_put(struct obd_export *exp)
 {
-	LASSERT(refcount_read(&exp->exp_refcount) >  0);
-	LASSERT(refcount_read(&exp->exp_refcount) < LI_POISON);
+	LASSERT(refcount_read(&exp->exp_handle.h_ref) >  0);
+	LASSERT(refcount_read(&exp->exp_handle.h_ref) < LI_POISON);
 	CDEBUG(D_INFO, "PUTting export %p : new refcount %d\n", exp,
-	       refcount_read(&exp->exp_refcount) - 1);
+	       refcount_read(&exp->exp_handle.h_ref) - 1);
 
-	if (refcount_dec_and_test(&exp->exp_refcount)) {
+	if (refcount_dec_and_test(&exp->exp_handle.h_ref)) {
 		struct obd_device *obd = exp->exp_obd;
 
 		CDEBUG(D_IOCTL, "final put %p/%s\n",
@@ -822,7 +817,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd,
 
 	export->exp_conn_cnt = 0;
 	/* 2 = class_handle_hash + last */
-	refcount_set(&export->exp_refcount, 2);
+	refcount_set(&export->exp_handle.h_ref, 2);
 	atomic_set(&export->exp_rpc_count, 0);
 	atomic_set(&export->exp_cb_count, 0);
 	atomic_set(&export->exp_locks_count, 0);
@@ -925,7 +920,7 @@ static void class_import_destroy(struct obd_import *imp)
 	CDEBUG(D_IOCTL, "destroying import %p for %s\n", imp,
 	       imp->imp_obd->obd_name);
 
-	LASSERT_ATOMIC_ZERO(&imp->imp_refcount);
+	LASSERT(refcount_read(&imp->imp_handle.h_ref) == 0);
 
 	ptlrpc_put_connection_superhack(imp->imp_connection);
 
@@ -942,21 +937,16 @@ static void class_import_destroy(struct obd_import *imp)
 	OBD_FREE_RCU(imp, sizeof(*imp), &imp->imp_handle);
 }
 
-static void import_handle_addref(void *import)
-{
-	class_import_get(import);
-}
-
 static struct portals_handle_ops import_handle_ops = {
-	.hop_addref	= import_handle_addref,
 	.hop_free	= NULL,
+	.hop_type	= "import",
 };
 
 struct obd_import *class_import_get(struct obd_import *import)
 {
-	atomic_inc(&import->imp_refcount);
-	CDEBUG(D_INFO, "import %p refcount=%d obd=%s\n", import,
-	       atomic_read(&import->imp_refcount),
+	refcount_inc(&import->imp_handle.h_ref);
+	CDEBUG(D_INFO, "GET import %p refcount=%d obd=%s\n", import,
+	       refcount_read(&import->imp_handle.h_ref),
 	       import->imp_obd->obd_name);
 	return import;
 }
@@ -964,19 +954,19 @@ struct obd_import *class_import_get(struct obd_import *import)
 
 void class_import_put(struct obd_import *imp)
 {
-	LASSERT_ATOMIC_GT_LT(&imp->imp_refcount, 0, LI_POISON);
+	LASSERT(refcount_read(&imp->imp_handle.h_ref) > 0);
 
 	CDEBUG(D_INFO, "import %p refcount=%d obd=%s\n", imp,
-	       atomic_read(&imp->imp_refcount) - 1,
+	       refcount_read(&imp->imp_handle.h_ref) - 1,
 	       imp->imp_obd->obd_name);
 
-	if (atomic_dec_and_test(&imp->imp_refcount)) {
+	if (refcount_dec_and_test(&imp->imp_handle.h_ref)) {
 		CDEBUG(D_INFO, "final put import %p\n", imp);
 		obd_zombie_import_add(imp);
 	}
 
 	/* catch possible import put race */
-	LASSERT_ATOMIC_GE_LT(&imp->imp_refcount, 0, LI_POISON);
+	LASSERT(refcount_read(&imp->imp_handle.h_ref) >= 0);
 }
 EXPORT_SYMBOL(class_import_put);
 
@@ -1026,7 +1016,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
 	init_waitqueue_head(&imp->imp_recovery_waitq);
 	INIT_WORK(&imp->imp_zombie_work, obd_zombie_imp_cull);
 
-	atomic_set(&imp->imp_refcount, 2);
+	refcount_set(&imp->imp_handle.h_ref, 2);
 	atomic_set(&imp->imp_unregistering, 0);
 	atomic_set(&imp->imp_inflight, 0);
 	atomic_set(&imp->imp_replay_inflight, 0);
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 32b70d6..32cd6ae 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -152,7 +152,10 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops)
 
 		spin_lock(&h->h_lock);
 		if (likely(h->h_in != 0)) {
-			h->h_ops->hop_addref(h);
+			refcount_inc(&h->h_ref);
+			CDEBUG(D_INFO, "GET %s %p refcount=%d\n",
+			       h->h_ops->hop_type, h,
+			       refcount_read(&h->h_ref));
 			retval = h;
 		}
 		spin_unlock(&h->h_lock);
diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index baf34c8..6f00eee 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -1638,7 +1638,7 @@ static int echo_client_cleanup(struct obd_device *obddev)
 		return -EBUSY;
 	}
 
-	LASSERT(refcount_read(&ec->ec_exp->exp_refcount) > 0);
+	LASSERT(refcount_read(&ec->ec_exp->exp_handle.h_ref) > 0);
 	rc = obd_disconnect(ec->ec_exp);
 	if (rc != 0)
 		CERROR("fail to disconnect device: %d\n", rc);
diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c
index 5a7e9fa..5e541ca 100644
--- a/fs/lustre/ptlrpc/service.c
+++ b/fs/lustre/ptlrpc/service.c
@@ -1697,7 +1697,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt,
 	       (request->rq_export ?
 		(char *)request->rq_export->exp_client_uuid.uuid : "0"),
 	       (request->rq_export ?
-		refcount_read(&request->rq_export->exp_refcount) : -99),
+		refcount_read(&request->rq_export->exp_handle.h_ref) : -99),
 	       lustre_msg_get_status(request->rq_reqmsg), request->rq_xid,
 	       libcfs_id2str(request->rq_peer),
 	       lustre_msg_get_opc(request->rq_reqmsg));
@@ -1741,7 +1741,7 @@ static bool ptlrpc_server_normal_pending(struct ptlrpc_service_part *svcpt,
 	       (request->rq_export ?
 		(char *)request->rq_export->exp_client_uuid.uuid : "0"),
 	       (request->rq_export ?
-		refcount_read(&request->rq_export->exp_refcount) : -99),
+		refcount_read(&request->rq_export->exp_handle.h_ref) : -99),
 	       lustre_msg_get_status(request->rq_reqmsg),
 	       request->rq_xid,
 	       libcfs_id2str(request->rq_peer),
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 18/29] lustre: discard OBD_FREE_RCU
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (16 preceding siblings ...)
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 17/29] lustre: handle: move refcount into the lustre_handle James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 19/29] lustre: portals_handle: rename ops to owner James Simmons
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

OBD_FREE_RCU and the hop_free call-back together form an overly
complex mechanism equivalent to kfree_rcu() or call_rcu(...).
Discard them and use the simpler approach.

This removes the only use for the field h_size, so discard
that too.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_handles.h  |  3 ---
 fs/lustre/include/obd_support.h     | 10 ----------
 fs/lustre/ldlm/ldlm_lock.c          | 16 ++++++++--------
 fs/lustre/obdclass/genops.c         |  6 ++----
 fs/lustre/obdclass/lustre_handles.c | 15 ---------------
 5 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index be5d41b..8fb4285 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -46,7 +46,6 @@
 #include <linux/types.h>
 
 struct portals_handle_ops {
-	void (*hop_free)(void *object, int size);
 	/* hop_type is used for some debugging messages */
 	char *hop_type;
 };
@@ -72,7 +71,6 @@ struct portals_handle {
 	/* newly added fields to handle the RCU issue. -jxiong */
 	struct rcu_head			h_rcu;
 	spinlock_t			h_lock;
-	unsigned int			h_size:31;
 	unsigned int			h_in:1;
 };
 
@@ -83,7 +81,6 @@ void class_handle_hash(struct portals_handle *,
 		       const struct portals_handle_ops *ops);
 void class_handle_unhash(struct portals_handle *);
 void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops);
-void class_handle_free_cb(struct rcu_head *rcu);
 int class_handle_init(void);
 void class_handle_cleanup(void);
 
diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index 3e15cac..91d6a86 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -503,16 +503,6 @@
 #define POISON_PAGE(page, val) do { } while (0)
 #endif
 
-#define OBD_FREE_RCU(ptr, size, handle)			\
-do {							\
-	struct portals_handle *__h = (handle);		\
-							\
-	__h->h_cookie = (unsigned long)(ptr);		\
-	__h->h_size = (size);				\
-	call_rcu(&__h->h_rcu, class_handle_free_cb);	\
-	POISON_PTR(ptr);				\
-} while (0)
-
 #define KEY_IS(str)					\
 	(keylen >= (sizeof(str) - 1) &&			\
 	memcmp(key, str, (sizeof(str) - 1)) == 0)
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index ecd3f4a..18f018d 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -153,6 +153,13 @@ struct ldlm_lock *ldlm_lock_get(struct ldlm_lock *lock)
 }
 EXPORT_SYMBOL(ldlm_lock_get);
 
+static void lock_handle_free(struct rcu_head *rcu)
+{
+	struct ldlm_lock *lock = container_of(rcu, struct ldlm_lock,
+					      l_handle.h_rcu);
+	kmem_cache_free(ldlm_lock_slab, lock);
+}
+
 /**
  * Release lock reference.
  *
@@ -186,7 +193,7 @@ void ldlm_lock_put(struct ldlm_lock *lock)
 		kfree(lock->l_lvb_data);
 
 		lu_ref_fini(&lock->l_reference);
-		OBD_FREE_RCU(lock, sizeof(*lock), &lock->l_handle);
+		call_rcu(&lock->l_handle.h_rcu, lock_handle_free);
 	}
 }
 EXPORT_SYMBOL(ldlm_lock_put);
@@ -356,14 +363,7 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock)
 	}
 }
 
-static void lock_handle_free(void *lock, int size)
-{
-	LASSERT(size == sizeof(struct ldlm_lock));
-	kmem_cache_free(ldlm_lock_slab, lock);
-}
-
 static struct portals_handle_ops lock_handle_ops = {
-	.hop_free   = lock_handle_free,
 	.hop_type   = "ldlm",
 };
 
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index c6dc3ba..1d1d457 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -743,11 +743,10 @@ static void class_export_destroy(struct obd_export *exp)
 	if (exp != obd->obd_self_export)
 		class_decref(obd, "export", exp);
 
-	OBD_FREE_RCU(exp, sizeof(*exp), &exp->exp_handle);
+	kfree_rcu(exp, exp_handle.h_rcu);
 }
 
 static struct portals_handle_ops export_handle_ops = {
-	.hop_free	= NULL,
 	.hop_type	= "export",
 };
 
@@ -934,11 +933,10 @@ static void class_import_destroy(struct obd_import *imp)
 
 	LASSERT(!imp->imp_sec);
 	class_decref(imp->imp_obd, "import", imp);
-	OBD_FREE_RCU(imp, sizeof(*imp), &imp->imp_handle);
+	kfree_rcu(imp, imp_handle.h_rcu);
 }
 
 static struct portals_handle_ops import_handle_ops = {
-	.hop_free	= NULL,
 	.hop_type	= "import",
 };
 
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 32cd6ae..8aece57 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -167,21 +167,6 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops)
 }
 EXPORT_SYMBOL(class_handle2object);
 
-void class_handle_free_cb(struct rcu_head *rcu)
-{
-	struct portals_handle *h;
-	void *ptr;
-
-	h = container_of(rcu, struct portals_handle, h_rcu);
-	ptr = (void *)(unsigned long)h->h_cookie;
-
-	if (h->h_ops->hop_free)
-		h->h_ops->hop_free(ptr, h->h_size);
-	else
-		kfree(ptr);
-}
-EXPORT_SYMBOL(class_handle_free_cb);
-
 int class_handle_init(void)
 {
 	struct handle_bucket *bucket;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 19/29] lustre: portals_handle: rename ops to owner
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (17 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 18/29] lustre: discard OBD_FREE_RCU James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 20/29] lustre: portals_handle: remove locking from class_handle2object() James Simmons
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

Now that portals_handle_ops contains only a char*,
it is functioning primarily to identify the owner of each handle.
So change the name to h_owner, and the type to const char*.

Note: this h_owner is now quiet different from the similar h_owner
in the server code.  When server code it merged the
"med" pointer should be stored in the "mfd" and validated separately.

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_handles.h  | 12 +++---------
 fs/lustre/ldlm/ldlm_lock.c          |  8 +++-----
 fs/lustre/obdclass/genops.c         | 17 ++++++-----------
 fs/lustre/obdclass/lustre_handles.c | 15 +++++++--------
 4 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index 8fb4285..4bae4d6 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -45,11 +45,6 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
-struct portals_handle_ops {
-	/* hop_type is used for some debugging messages */
-	char *hop_type;
-};
-
 /* These handles are most easily used by having them appear at the very top of
  * whatever object that you want to make handles for.  ie:
  *
@@ -65,7 +60,7 @@ struct portals_handle_ops {
 struct portals_handle {
 	struct list_head		h_link;
 	u64				h_cookie;
-	const struct portals_handle_ops	*h_ops;
+	const char			*h_owner;
 	refcount_t			h_ref;
 
 	/* newly added fields to handle the RCU issue. -jxiong */
@@ -77,10 +72,9 @@ struct portals_handle {
 /* handles.c */
 
 /* Add a handle to the hash table */
-void class_handle_hash(struct portals_handle *,
-		       const struct portals_handle_ops *ops);
+void class_handle_hash(struct portals_handle *, const char *owner);
 void class_handle_unhash(struct portals_handle *);
-void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops);
+void *class_handle2object(u64 cookie, const char *owner);
 int class_handle_init(void);
 void class_handle_cleanup(void);
 
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 18f018d..56a2d1d 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -363,9 +363,7 @@ void ldlm_lock_destroy_nolock(struct ldlm_lock *lock)
 	}
 }
 
-static struct portals_handle_ops lock_handle_ops = {
-	.hop_type   = "ldlm",
-};
+static const char lock_handle_owner[] = "ldlm";
 
 /**
  *
@@ -405,7 +403,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource)
 	lprocfs_counter_incr(ldlm_res_to_ns(resource)->ns_stats,
 			     LDLM_NSS_LOCKS);
 	INIT_LIST_HEAD(&lock->l_handle.h_link);
-	class_handle_hash(&lock->l_handle, &lock_handle_ops);
+	class_handle_hash(&lock->l_handle, lock_handle_owner);
 
 	lu_ref_init(&lock->l_reference);
 	lu_ref_add(&lock->l_reference, "hash", lock);
@@ -509,7 +507,7 @@ struct ldlm_lock *__ldlm_handle2lock(const struct lustre_handle *handle,
 
 	LASSERT(handle);
 
-	lock = class_handle2object(handle->cookie, &lock_handle_ops);
+	lock = class_handle2object(handle->cookie, lock_handle_owner);
 	if (!lock)
 		return NULL;
 
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 1d1d457..257609b 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -677,7 +677,8 @@ int obd_init_caches(void)
 	return -ENOMEM;
 }
 
-static struct portals_handle_ops export_handle_ops;
+static const char export_handle_owner[] = "export";
+
 /* map connection to client */
 struct obd_export *class_conn2export(struct lustre_handle *conn)
 {
@@ -694,7 +695,7 @@ struct obd_export *class_conn2export(struct lustre_handle *conn)
 	}
 
 	CDEBUG(D_INFO, "looking for export cookie %#llx\n", conn->cookie);
-	export = class_handle2object(conn->cookie, &export_handle_ops);
+	export = class_handle2object(conn->cookie, export_handle_owner);
 	return export;
 }
 EXPORT_SYMBOL(class_conn2export);
@@ -746,10 +747,6 @@ static void class_export_destroy(struct obd_export *exp)
 	kfree_rcu(exp, exp_handle.h_rcu);
 }
 
-static struct portals_handle_ops export_handle_ops = {
-	.hop_type	= "export",
-};
-
 struct obd_export *class_export_get(struct obd_export *exp)
 {
 	refcount_inc(&exp->exp_handle.h_ref);
@@ -832,7 +829,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd,
 	INIT_LIST_HEAD(&export->exp_req_replay_queue);
 	INIT_LIST_HEAD(&export->exp_handle.h_link);
 	INIT_LIST_HEAD(&export->exp_hp_rpcs);
-	class_handle_hash(&export->exp_handle, &export_handle_ops);
+	class_handle_hash(&export->exp_handle, export_handle_owner);
 	spin_lock_init(&export->exp_lock);
 	spin_lock_init(&export->exp_rpc_lock);
 	spin_lock_init(&export->exp_bl_list_lock);
@@ -936,9 +933,7 @@ static void class_import_destroy(struct obd_import *imp)
 	kfree_rcu(imp, imp_handle.h_rcu);
 }
 
-static struct portals_handle_ops import_handle_ops = {
-	.hop_type	= "import",
-};
+static const char import_handle_owner[] = "import";
 
 struct obd_import *class_import_get(struct obd_import *import)
 {
@@ -1021,7 +1016,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
 	atomic_set(&imp->imp_inval_count, 0);
 	INIT_LIST_HEAD(&imp->imp_conn_list);
 	INIT_LIST_HEAD(&imp->imp_handle.h_link);
-	class_handle_hash(&imp->imp_handle, &import_handle_ops);
+	class_handle_hash(&imp->imp_handle, import_handle_owner);
 	init_imp_at(&imp->imp_at);
 
 	/* the default magic is V2, will be used in connect RPC, and
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 8aece57..00d8a89 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -58,8 +58,7 @@
  * Generate a unique 64bit cookie (hash) for a handle and insert it into
  * global (per-node) hash-table.
  */
-void class_handle_hash(struct portals_handle *h,
-		       const struct portals_handle_ops *ops)
+void class_handle_hash(struct portals_handle *h, const char *owner)
 {
 	struct handle_bucket *bucket;
 
@@ -85,7 +84,7 @@ void class_handle_hash(struct portals_handle *h,
 	h->h_cookie = handle_base;
 	spin_unlock(&handle_base_lock);
 
-	h->h_ops = ops;
+	h->h_owner = owner;
 	spin_lock_init(&h->h_lock);
 
 	bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK];
@@ -132,7 +131,7 @@ void class_handle_unhash(struct portals_handle *h)
 }
 EXPORT_SYMBOL(class_handle_unhash);
 
-void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops)
+void *class_handle2object(u64 cookie, const char *owner)
 {
 	struct handle_bucket *bucket;
 	struct portals_handle *h;
@@ -147,14 +146,14 @@ void *class_handle2object(u64 cookie, const struct portals_handle_ops *ops)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(h, &bucket->head, h_link) {
-		if (h->h_cookie != cookie || h->h_ops != ops)
+		if (h->h_cookie != cookie || h->h_owner != owner)
 			continue;
 
 		spin_lock(&h->h_lock);
 		if (likely(h->h_in != 0)) {
 			refcount_inc(&h->h_ref);
 			CDEBUG(D_INFO, "GET %s %p refcount=%d\n",
-			       h->h_ops->hop_type, h,
+			       h->h_owner, h,
 			       refcount_read(&h->h_ref));
 			retval = h;
 		}
@@ -201,8 +200,8 @@ static int cleanup_all_handles(void)
 
 		spin_lock(&handle_hash[i].lock);
 		list_for_each_entry_rcu(h, &handle_hash[i].head, h_link) {
-			CERROR("force clean handle %#llx addr %p ops %p\n",
-			       h->h_cookie, h, h->h_ops);
+			CERROR("force clean handle %#llx addr %p owner %p\n",
+			       h->h_cookie, h, h->h_owner);
 
 			class_handle_unhash_nolock(h);
 			rc++;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 20/29] lustre: portals_handle: remove locking from class_handle2object()
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (18 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 19/29] lustre: portals_handle: rename ops to owner James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 21/29] lustre: portals_handle: use hlist for hash lists James Simmons
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

There can be no value in this locking and test on h_in.

If the lookup could have run in parallel with
class_handle_unhash_nolock() and seen "h_in == 0", then it could
equally well have run moments earlier and not seen it - no locking
would prevent that, so the caller much be prepared to have
an object returned which has already been unhashed by the time it
sees the object.

In other words, an interlock between unhash and lookup must be
provided at a higher level than where this code is trying
to handle it.

So remove this pointless code.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/lustre_handles.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 00d8a89..d8bab07 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -149,15 +149,11 @@ void *class_handle2object(u64 cookie, const char *owner)
 		if (h->h_cookie != cookie || h->h_owner != owner)
 			continue;
 
-		spin_lock(&h->h_lock);
-		if (likely(h->h_in != 0)) {
-			refcount_inc(&h->h_ref);
-			CDEBUG(D_INFO, "GET %s %p refcount=%d\n",
-			       h->h_owner, h,
-			       refcount_read(&h->h_ref));
-			retval = h;
-		}
-		spin_unlock(&h->h_lock);
+		refcount_inc(&h->h_ref);
+		CDEBUG(D_INFO, "GET %s %p refcount=%d\n",
+		       h->h_owner, h,
+		       refcount_read(&h->h_ref));
+		retval = h;
 		break;
 	}
 	rcu_read_unlock();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 21/29] lustre: portals_handle: use hlist for hash lists.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (19 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 20/29] lustre: portals_handle: remove locking from class_handle2object() James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 22/29] lustre: portals_handle: discard h_lock James Simmons
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

hlist_head/hlist_node is the preferred data structure
for hash tables. Not only does it make the 'head' smaller,
but is also provides hlist_unhashed() which can be used to
check if an object is in the list.  This means that
we don't need h_in any more.

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_handles.h  |  3 +--
 fs/lustre/ldlm/ldlm_lock.c          |  2 +-
 fs/lustre/obdclass/genops.c         |  4 ++--
 fs/lustre/obdclass/lustre_handles.c | 20 +++++++++-----------
 4 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index 4bae4d6..a4aeac8 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -58,7 +58,7 @@
  * to compute the start of the structure based on the handle field.
  */
 struct portals_handle {
-	struct list_head		h_link;
+	struct hlist_node		h_link;
 	u64				h_cookie;
 	const char			*h_owner;
 	refcount_t			h_ref;
@@ -66,7 +66,6 @@ struct portals_handle {
 	/* newly added fields to handle the RCU issue. -jxiong */
 	struct rcu_head			h_rcu;
 	spinlock_t			h_lock;
-	unsigned int			h_in:1;
 };
 
 /* handles.c */
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 56a2d1d..5ac7723 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -402,7 +402,7 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource)
 
 	lprocfs_counter_incr(ldlm_res_to_ns(resource)->ns_stats,
 			     LDLM_NSS_LOCKS);
-	INIT_LIST_HEAD(&lock->l_handle.h_link);
+	INIT_HLIST_NODE(&lock->l_handle.h_link);
 	class_handle_hash(&lock->l_handle, lock_handle_owner);
 
 	lu_ref_init(&lock->l_reference);
diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
index 257609b..e27cd40 100644
--- a/fs/lustre/obdclass/genops.c
+++ b/fs/lustre/obdclass/genops.c
@@ -827,7 +827,7 @@ static struct obd_export *__class_new_export(struct obd_device *obd,
 	spin_lock_init(&export->exp_uncommitted_replies_lock);
 	INIT_LIST_HEAD(&export->exp_uncommitted_replies);
 	INIT_LIST_HEAD(&export->exp_req_replay_queue);
-	INIT_LIST_HEAD(&export->exp_handle.h_link);
+	INIT_HLIST_NODE(&export->exp_handle.h_link);
 	INIT_LIST_HEAD(&export->exp_hp_rpcs);
 	class_handle_hash(&export->exp_handle, export_handle_owner);
 	spin_lock_init(&export->exp_lock);
@@ -1015,7 +1015,7 @@ struct obd_import *class_new_import(struct obd_device *obd)
 	atomic_set(&imp->imp_replay_inflight, 0);
 	atomic_set(&imp->imp_inval_count, 0);
 	INIT_LIST_HEAD(&imp->imp_conn_list);
-	INIT_LIST_HEAD(&imp->imp_handle.h_link);
+	INIT_HLIST_NODE(&imp->imp_handle.h_link);
 	class_handle_hash(&imp->imp_handle, import_handle_owner);
 	init_imp_at(&imp->imp_at);
 
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index d8bab07..343d575 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -48,7 +48,7 @@
 
 static struct handle_bucket {
 	spinlock_t		lock;
-	struct list_head	head;
+	struct hlist_head	head;
 } *handle_hash;
 
 #define HANDLE_HASH_SIZE (1 << 16)
@@ -63,7 +63,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner)
 	struct handle_bucket *bucket;
 
 	LASSERT(h);
-	LASSERT(list_empty(&h->h_link));
+	LASSERT(hlist_unhashed(&h->h_link));
 
 	/*
 	 * This is fast, but simplistic cookie generation algorithm, it will
@@ -89,8 +89,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner)
 
 	bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK];
 	spin_lock(&bucket->lock);
-	list_add_rcu(&h->h_link, &bucket->head);
-	h->h_in = 1;
+	hlist_add_head_rcu(&h->h_link, &bucket->head);
 	spin_unlock(&bucket->lock);
 
 	CDEBUG(D_INFO, "added object %p with handle %#llx to hash\n",
@@ -100,7 +99,7 @@ void class_handle_hash(struct portals_handle *h, const char *owner)
 
 static void class_handle_unhash_nolock(struct portals_handle *h)
 {
-	if (list_empty(&h->h_link)) {
+	if (hlist_unhashed(&h->h_link)) {
 		CERROR("removing an already-removed handle (%#llx)\n",
 		       h->h_cookie);
 		return;
@@ -110,13 +109,12 @@ static void class_handle_unhash_nolock(struct portals_handle *h)
 	       h, h->h_cookie);
 
 	spin_lock(&h->h_lock);
-	if (h->h_in == 0) {
+	if (hlist_unhashed(&h->h_link)) {
 		spin_unlock(&h->h_lock);
 		return;
 	}
-	h->h_in = 0;
+	hlist_del_init_rcu(&h->h_link);
 	spin_unlock(&h->h_lock);
-	list_del_rcu(&h->h_link);
 }
 
 void class_handle_unhash(struct portals_handle *h)
@@ -145,7 +143,7 @@ void *class_handle2object(u64 cookie, const char *owner)
 	bucket = handle_hash + (cookie & HANDLE_HASH_MASK);
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(h, &bucket->head, h_link) {
+	hlist_for_each_entry_rcu(h, &bucket->head, h_link) {
 		if (h->h_cookie != cookie || h->h_owner != owner)
 			continue;
 
@@ -176,7 +174,7 @@ int class_handle_init(void)
 	spin_lock_init(&handle_base_lock);
 	for (bucket = handle_hash + HANDLE_HASH_SIZE - 1; bucket >= handle_hash;
 	     bucket--) {
-		INIT_LIST_HEAD(&bucket->head);
+		INIT_HLIST_HEAD(&bucket->head);
 		spin_lock_init(&bucket->lock);
 	}
 
@@ -195,7 +193,7 @@ static int cleanup_all_handles(void)
 		struct portals_handle *h;
 
 		spin_lock(&handle_hash[i].lock);
-		list_for_each_entry_rcu(h, &handle_hash[i].head, h_link) {
+		hlist_for_each_entry_rcu(h, &handle_hash[i].head, h_link) {
 			CERROR("force clean handle %#llx addr %p owner %p\n",
 			       h->h_cookie, h, h->h_owner);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 22/29] lustre: portals_handle: discard h_lock.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (20 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 21/29] lustre: portals_handle: use hlist for hash lists James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 23/29] lustre: remove unused fields from struct obd_device James Simmons
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

The h_lock spinlock is now only taken while bucket->lock
is held.  As a handle is associated with precisely one bucket,
this means that h_lock can never be contended, so it isn't needed.

So discard h_lock.

Also discard an increasingly irrelevant comment in the declaration
of struct portals_handle.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_handles.h  | 3 ---
 fs/lustre/obdclass/lustre_handles.c | 7 -------
 2 files changed, 10 deletions(-)

diff --git a/fs/lustre/include/lustre_handles.h b/fs/lustre/include/lustre_handles.h
index a4aeac8..d0fefcb 100644
--- a/fs/lustre/include/lustre_handles.h
+++ b/fs/lustre/include/lustre_handles.h
@@ -62,10 +62,7 @@ struct portals_handle {
 	u64				h_cookie;
 	const char			*h_owner;
 	refcount_t			h_ref;
-
-	/* newly added fields to handle the RCU issue. -jxiong */
 	struct rcu_head			h_rcu;
-	spinlock_t			h_lock;
 };
 
 /* handles.c */
diff --git a/fs/lustre/obdclass/lustre_handles.c b/fs/lustre/obdclass/lustre_handles.c
index 343d575..d9ab9ed 100644
--- a/fs/lustre/obdclass/lustre_handles.c
+++ b/fs/lustre/obdclass/lustre_handles.c
@@ -85,7 +85,6 @@ void class_handle_hash(struct portals_handle *h, const char *owner)
 	spin_unlock(&handle_base_lock);
 
 	h->h_owner = owner;
-	spin_lock_init(&h->h_lock);
 
 	bucket = &handle_hash[h->h_cookie & HANDLE_HASH_MASK];
 	spin_lock(&bucket->lock);
@@ -108,13 +107,7 @@ static void class_handle_unhash_nolock(struct portals_handle *h)
 	CDEBUG(D_INFO, "removing object %p with handle %#llx from hash\n",
 	       h, h->h_cookie);
 
-	spin_lock(&h->h_lock);
-	if (hlist_unhashed(&h->h_link)) {
-		spin_unlock(&h->h_lock);
-		return;
-	}
 	hlist_del_init_rcu(&h->h_link);
-	spin_unlock(&h->h_lock);
 }
 
 void class_handle_unhash(struct portals_handle *h)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 23/29] lustre: remove unused fields from struct obd_device
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (21 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 22/29] lustre: portals_handle: discard h_lock James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 24/29] lustre: obd_sysfs: error-check value stored in jobid_var James Simmons
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

One field is set but never access.
Another is accessed but never set.
Other are never mentioned at all.

Also remove the comment
        /* use separate field as it is set in interrupt to don't mess with
         * protection of other bits using _bh lock
         */

This is not correct - were it used, obd_recovery_expired would be part
of the same 'unsigned long' as the other fields.

With the removal of obd_no_transno we can eliminate the ioctl
OBD_IOC_NO_TRANSNO which is server side only.

Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/obd.h                  | 33 ++++++++++++--------------------
 fs/lustre/mgc/mgc_request.c              |  4 ++--
 fs/lustre/obdclass/class_obd.c           | 27 +++++---------------------
 fs/lustre/obdclass/obd_config.c          |  1 -
 include/uapi/linux/lustre/lustre_ioctl.h |  1 -
 5 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 3bdde31..93a47cf 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -539,27 +539,18 @@ struct obd_device {
 	char			 obd_name[MAX_OBD_NAME];
 
 	/* bitfield modification is protected by obd_dev_lock */
-	unsigned long obd_attached:1,      /* finished attach */
-		      obd_set_up:1,	/* finished setup */
-		      obd_version_recov:1, /* obd uses version checking */
-		      obd_replayable:1,/* recovery is enabled; inform clients */
-		      obd_no_transno:1,  /* no committed-transno notification */
-		      obd_no_recov:1,      /* fail instead of retry messages */
-		      obd_stopping:1,      /* started cleanup */
-		      obd_starting:1,      /* started setup */
-		      obd_force:1,	 /* cleanup with > 0 obd refcount */
-		      obd_fail:1,	 /* cleanup with failover */
-		      obd_no_conn:1,       /* deny new connections */
-		      obd_inactive:1,      /* device active/inactive
-					    * (for sysfs status only!!)
-					    */
-		      obd_no_ir:1,	 /* no imperative recovery. */
-		      obd_process_conf:1,  /* device is processing mgs config */
-		      obd_checksum_dump:1; /* dump pages upon cksum error */
-	/* use separate field as it is set in interrupt to don't mess with
-	 * protection of other bits using _bh lock
-	 */
-	unsigned long obd_recovery_expired:1;
+	unsigned long obd_attached:1,	  /* finished attach */
+		      obd_set_up:1,	  /* finished setup */
+		      obd_no_recov:1,	  /* fail instead of retry messages */
+		      obd_stopping:1,	  /* started cleanup */
+		      obd_starting:1,	  /* started setup */
+		      obd_force:1,	  /* cleanup with > 0 obd refcount */
+		      obd_fail:1,	  /* cleanup with failover */
+		      obd_inactive:1,	  /* device active/inactive
+					   * (for sysfs status only!!)
+					   */
+		      obd_process_conf:1;/* device is processing mgs config */
+
 	/* uuid-export hash body */
 	struct rhashtable	 obd_uuid_hash;
 	wait_queue_head_t	 obd_refcount_waitq;
diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index 1eca81a..0d13b17 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -984,10 +984,10 @@ static int mgc_set_info_async(const struct lu_env *env, struct obd_export *exp,
 		if (vallen != sizeof(int))
 			return -EINVAL;
 		value = *(int *)val;
-		CDEBUG(D_MGC, "InitRecov %s %d/d%d:i%d:r%d:or%d:%s\n",
+		CDEBUG(D_MGC, "InitRecov %s %d/d%d:i%d:r%d:%s\n",
 		       imp->imp_obd->obd_name, value,
 		       imp->imp_deactive, imp->imp_invalid,
-		       imp->imp_replayable, imp->imp_obd->obd_replayable,
+		       imp->imp_replayable,
 		       ptlrpc_import_state_name(imp->imp_state));
 		/* Resurrect if we previously died */
 		if ((imp->imp_state != LUSTRE_IMP_FULL &&
diff --git a/fs/lustre/obdclass/class_obd.c b/fs/lustre/obdclass/class_obd.c
index b8fc740..8e7f7cd 100644
--- a/fs/lustre/obdclass/class_obd.c
+++ b/fs/lustre/obdclass/class_obd.c
@@ -516,29 +516,12 @@ int class_handle_ioctl(unsigned int cmd, unsigned long arg)
 		goto out;
 	}
 
-	switch (cmd) {
-	case OBD_IOC_NO_TRANSNO:
-		if (!obd->obd_attached) {
-			CERROR("Device %d not attached\n", obd->obd_minor);
-			err = -ENODEV;
-			goto out;
-		}
-		CDEBUG(D_HA, "%s: disabling committed-transno notification\n",
-		       obd->obd_name);
-		obd->obd_no_transno = 1;
-		err = 0;
-		break;
-
-	default:
-		err = obd_iocontrol(cmd, obd->obd_self_export, len, data, NULL);
-		if (err)
-			goto out;
-
-		if (copy_to_user((void __user *)arg, data, len))
-			err = -EFAULT;
-		break;
-	}
+	err = obd_iocontrol(cmd, obd->obd_self_export, len, data, NULL);
+	if (err)
+		goto out;
 
+	if (copy_to_user((void __user *)arg, data, len))
+		err = -EFAULT;
  out:
 	kvfree(data);
 	return err;
diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 0da69f6..c96cd5c 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -487,7 +487,6 @@ static int class_cleanup(struct obd_device *obd, struct lustre_cfg *lcfg)
 				LCONSOLE_WARN("Failing over %s\n",
 					      obd->obd_name);
 				obd->obd_fail = 1;
-				obd->obd_no_transno = 1;
 				obd->obd_no_recov = 1;
 				if (OBP(obd, iocontrol)) {
 					obd_iocontrol(OBD_IOC_SYNC,
diff --git a/include/uapi/linux/lustre/lustre_ioctl.h b/include/uapi/linux/lustre/lustre_ioctl.h
index 098b6451..a2774c2 100644
--- a/include/uapi/linux/lustre/lustre_ioctl.h
+++ b/include/uapi/linux/lustre/lustre_ioctl.h
@@ -165,7 +165,6 @@ static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data)
 #define OBD_IOC_PING_TARGET	_IOW('f', 136, OBD_IOC_DATA_TYPE)
 
 /*	OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */
-#define OBD_IOC_NO_TRANSNO	_IOW('f', 140, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_SET_READONLY	_IOW('f', 141, OBD_IOC_DATA_TYPE)
 #define OBD_IOC_ABORT_RECOVERY	_IOR('f', 142, OBD_IOC_DATA_TYPE)
 /*	OBD_IOC_ROOT_SQUASH	_IOWR('f', 143, OBD_IOC_DATA_TYPE) */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 24/29] lustre: obd_sysfs: error-check value stored in jobid_var
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (22 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 23/29] lustre: remove unused fields from struct obd_device James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 25/29] lustre: obdclass: discard process_quota_config James Simmons
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

The jobid_var sysfs attribute only has 3 meaningful values.
Other values cause lustre_get_jobid() to return an error
which is uniformly ignored.

To improve usability and resilience, check that the value
written is acceptable before storing it.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/obd_sysfs.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c
index e9e0bcf..d5cf5d0 100644
--- a/fs/lustre/obdclass/obd_sysfs.c
+++ b/fs/lustre/obdclass/obd_sysfs.c
@@ -216,16 +216,25 @@ static ssize_t jobid_var_store(struct kobject *kobj, struct attribute *attr,
 			       const char *buffer,
 			       size_t count)
 {
+	static const char * const valid[] = {
+		JOBSTATS_DISABLE,
+		JOBSTATS_PROCNAME_UID,
+		JOBSTATS_NODELOCAL,
+		NULL
+	};
+	int i;
+
 	if (!count || count > JOBSTATS_JOBID_VAR_MAX_LEN)
 		return -EINVAL;
 
-	memset(obd_jobid_var, 0, JOBSTATS_JOBID_VAR_MAX_LEN + 1);
-
-	memcpy(obd_jobid_var, buffer, count);
+	for (i = 0; valid[i]; i++)
+		if (sysfs_streq(buffer, valid[i]))
+			break;
+	if (!valid[i])
+		return -EINVAL;
 
-	/* Trim the trailing '\n' if any */
-	if (obd_jobid_var[count - 1] == '\n')
-		obd_jobid_var[count - 1] = 0;
+	memset(obd_jobid_var, 0, JOBSTATS_JOBID_VAR_MAX_LEN + 1);
+	strcpy(obd_jobid_var, valid[i]);
 
 	return count;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 25/29] lustre: obdclass: discard process_quota_config
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (23 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 24/29] lustre: obd_sysfs: error-check value stored in jobid_var James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 26/29] lustre: obdclass: remove unnecessary code from lustre_init_lsi() James Simmons
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

This variable is never set since it exist only for the server
code, so it can be discarded. Additionally instead of using
a function pointer hook the quota layer should use
class_modify_config().

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/obd_config.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index c96cd5c..b1d0c96 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -742,11 +742,6 @@ void class_put_profile(struct lustre_profile *lprof)
 }
 EXPORT_SYMBOL(class_put_profile);
 
-/* We can't call lquota_process_config directly because
- * it lives in a module that must be loaded after this one.
- */
-static int (*quota_process_config)(struct lustre_cfg *lcfg);
-
 static int process_param2_config(struct lustre_cfg *lcfg)
 {
 	char *param = lustre_cfg_string(lcfg, 1);
@@ -922,11 +917,6 @@ int class_process_config(struct lustre_cfg *lcfg)
 
 			err = 0;
 			goto out;
-		} else if ((class_match_param(lustre_cfg_string(lcfg, 1),
-					      PARAM_QUOTA, &tmp) == 0) &&
-			   quota_process_config) {
-			err = (*quota_process_config)(lcfg);
-			goto out;
 		}
 
 		break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 26/29] lustre: obdclass: remove unnecessary code from lustre_init_lsi()
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (24 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 25/29] lustre: obdclass: discard process_quota_config James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH 27/29] lustre: ldlm: discard l_lock from struct ldlm_lock James Simmons
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

After allocating a struct with kzalloc, there is no value
in setting a few of the fields to zero.

And as all fields were zero, it must be safe to kfree lmd_exclude,
whether lmd_exclude_count is zero or not.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/obdclass/obd_mount.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/lustre/obdclass/obd_mount.c b/fs/lustre/obdclass/obd_mount.c
index 104e64b..4c86ca5 100644
--- a/fs/lustre/obdclass/obd_mount.c
+++ b/fs/lustre/obdclass/obd_mount.c
@@ -517,9 +517,6 @@ struct lustre_sb_info *lustre_init_lsi(struct super_block *sb)
 		return NULL;
 	}
 
-	lsi->lsi_lmd->lmd_exclude_count = 0;
-	lsi->lsi_lmd->lmd_recovery_time_soft = 0;
-	lsi->lsi_lmd->lmd_recovery_time_hard = 0;
 	s2lsi_nocast(sb) = lsi;
 	/* we take 1 extra ref for our setup */
 	atomic_set(&lsi->lsi_mounts, 1);
@@ -546,8 +543,7 @@ static int lustre_free_lsi(struct super_block *sb)
 		kfree(lsi->lsi_lmd->lmd_fileset);
 		kfree(lsi->lsi_lmd->lmd_mgssec);
 		kfree(lsi->lsi_lmd->lmd_opts);
-		if (lsi->lsi_lmd->lmd_exclude_count)
-			kfree(lsi->lsi_lmd->lmd_exclude);
+		kfree(lsi->lsi_lmd->lmd_exclude);
 		kfree(lsi->lsi_lmd->lmd_mgs);
 		kfree(lsi->lsi_lmd->lmd_osd_type);
 		kfree(lsi->lsi_lmd->lmd_params);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH 27/29] lustre: ldlm: discard l_lock from struct ldlm_lock.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (25 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 26/29] lustre: obdclass: remove unnecessary code from lustre_init_lsi() James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 28/29] lustre: ldlm: don't access l_resource when not locked James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 29/29] lustre: ldlm: drop SLAB_TYPESAFE_BY_RCU from ldlm_lock slab James Simmons
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

This spinlock (l_lock) is only used to stablise the l_resource
pointer while taking a spinlock on the resource.

This is not necessary - it is sufficient to take the resource
spinlock, and then check if l_resource has changed or not.  If it
hasn't then it cannot change until the resource spinlock is dropped.

We must ensure this is safe even if the resource is freed before
lock_res_and_lock() managed to get the lock.  To do this we mark the
slab as SLAB_TYPESAFE_BY_RCU and initialise the lock in an
init_once() function, but not on every allocate (and specifically
don't zero the whole srtuct on each allocation).
This means that if we find a resource after taking the RCU read lock,
then it is always safe to take and then drop the spinlock.
After taking the spinlock, we can check if it is more generally safe
to use.

Discarding l_lock shrinks 'struct ldlm_lock' which helps save memory.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/include/lustre_dlm.h |  5 -----
 fs/lustre/ldlm/l_lock.c        | 20 ++++++++++++--------
 fs/lustre/ldlm/ldlm_lock.c     | 26 +++++++++++++-------------
 fs/lustre/ldlm/ldlm_lockd.c    | 21 ++++++++++++++++++++-
 fs/lustre/ldlm/ldlm_resource.c |  9 +++++----
 5 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h
index fd9b0f8..3417661 100644
--- a/fs/lustre/include/lustre_dlm.h
+++ b/fs/lustre/include/lustre_dlm.h
@@ -591,11 +591,6 @@ struct ldlm_lock {
 	 */
 	struct portals_handle		l_handle;
 	/**
-	 * Internal spinlock protects l_resource.  We should hold this lock
-	 * first before taking res_lock.
-	 */
-	spinlock_t			l_lock;
-	/**
 	 * Pointer to actual resource this lock is in.
 	 * ldlm_lock_change_resource() can change this.
 	 */
diff --git a/fs/lustre/ldlm/l_lock.c b/fs/lustre/ldlm/l_lock.c
index 296259a..0ba4942 100644
--- a/fs/lustre/ldlm/l_lock.c
+++ b/fs/lustre/ldlm/l_lock.c
@@ -45,15 +45,21 @@
  * being an atomic operation.
  */
 struct ldlm_resource *lock_res_and_lock(struct ldlm_lock *lock)
-				__acquires(&lock->l_lock)
 				__acquires(&lock->l_resource->lr_lock)
 {
-	spin_lock(&lock->l_lock);
+	struct ldlm_resource *res;
 
-	lock_res(lock->l_resource);
-
-	ldlm_set_res_locked(lock);
-	return lock->l_resource;
+	rcu_read_lock();
+	while (1) {
+		res = rcu_dereference(lock->l_resource);
+		lock_res(res);
+		if (res == lock->l_resource) {
+			ldlm_set_res_locked(lock);
+			rcu_read_unlock();
+			return res;
+		}
+		unlock_res(res);
+	}
 }
 EXPORT_SYMBOL(lock_res_and_lock);
 
@@ -62,12 +68,10 @@ struct ldlm_resource *lock_res_and_lock(struct ldlm_lock *lock)
  */
 void unlock_res_and_lock(struct ldlm_lock *lock)
 		__releases(&lock->l_resource->lr_lock)
-		__releases(&lock->l_lock)
 {
 	/* on server-side resource of lock doesn't change */
 	ldlm_clear_res_locked(lock);
 
 	unlock_res(lock->l_resource);
-	spin_unlock(&lock->l_lock);
 }
 EXPORT_SYMBOL(unlock_res_and_lock);
diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index 5ac7723..e62dad1 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -383,7 +383,6 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource)
 	if (!lock)
 		return NULL;
 
-	spin_lock_init(&lock->l_lock);
 	lock->l_resource = resource;
 	lu_ref_add(&resource->lr_reference, "lock", lock);
 
@@ -452,12 +451,13 @@ int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock,
 
 	lu_ref_add(&newres->lr_reference, "lock", lock);
 	/*
-	 * To flip the lock from the old to the new resource, lock, oldres and
-	 * newres have to be locked. Resource spin-locks are nested within
-	 * lock->l_lock, and are taken in the memory address order to avoid
-	 * dead-locks.
+	 * To flip the lock from the old to the new resource, oldres
+	 * and newres have to be locked. Resource spin-locks are taken
+	 * in the memory address order to avoid dead-locks.
+	 * As this is the only circumstance where ->l_resource
+	 * can change, and this cannot race with itself, it is safe
+	 * to access lock->l_resource without being careful about locking.
 	 */
-	spin_lock(&lock->l_lock);
 	oldres = lock->l_resource;
 	if (oldres < newres) {
 		lock_res(oldres);
@@ -468,9 +468,9 @@ int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock,
 	}
 	LASSERT(memcmp(new_resid, &oldres->lr_name,
 		       sizeof(oldres->lr_name)) != 0);
-	lock->l_resource = newres;
+	rcu_assign_pointer(lock->l_resource, newres);
 	unlock_res(oldres);
-	unlock_res_and_lock(lock);
+	unlock_res(newres);
 
 	/* ...and the flowers are still standing! */
 	lu_ref_del(&oldres->lr_reference, "lock", lock);
@@ -1964,11 +1964,11 @@ void _ldlm_lock_debug(struct ldlm_lock *lock,
 	va_list args;
 	struct va_format vaf;
 
-	if (spin_trylock(&lock->l_lock)) {
-		if (lock->l_resource)
-			resource = ldlm_resource_getref(lock->l_resource);
-		spin_unlock(&lock->l_lock);
-	}
+	rcu_read_lock();
+	resource = rcu_dereference(lock->l_resource);
+	if (resource && !atomic_inc_not_zero(&resource->lr_refcount))
+		resource = NULL;
+	rcu_read_unlock();
 
 	va_start(args, fmt);
 	vaf.fmt = fmt;
diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index 589b89d..ea87fd6 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -1097,6 +1097,23 @@ static int ldlm_cleanup(void)
 	return 0;
 }
 
+void ldlm_resource_init_once(void *p)
+{
+	/*
+	 * It is import to initialise the spinlock only once,
+	 * as ldlm_lock_change_resource() could try to lock
+	 * the resource *after* it has been freed and possibly
+	 * reused. SLAB_TYPESAFE_BY_RCU ensures the memory won't
+	 * be freed while the lock is being taken, but we need to
+	 * ensure that it doesn't get reinitialized either.
+	 */
+	struct ldlm_resource *res = p;
+
+	memset(res, 0, sizeof(*res));
+	mutex_init(&res->lr_lvb_mutex);
+	spin_lock_init(&res->lr_lock);
+}
+
 int ldlm_init(void)
 {
 	mutex_init(&ldlm_ref_mutex);
@@ -1104,7 +1121,9 @@ int ldlm_init(void)
 	mutex_init(ldlm_namespace_lock(LDLM_NAMESPACE_CLIENT));
 	ldlm_resource_slab = kmem_cache_create("ldlm_resources",
 					       sizeof(struct ldlm_resource), 0,
-					       SLAB_HWCACHE_ALIGN, NULL);
+					       SLAB_TYPESAFE_BY_RCU |
+					       SLAB_HWCACHE_ALIGN,
+					       ldlm_resource_init_once);
 	if (!ldlm_resource_slab)
 		return -ENOMEM;
 
diff --git a/fs/lustre/ldlm/ldlm_resource.c b/fs/lustre/ldlm/ldlm_resource.c
index 45b2e97..d79f70d 100644
--- a/fs/lustre/ldlm/ldlm_resource.c
+++ b/fs/lustre/ldlm/ldlm_resource.c
@@ -994,12 +994,14 @@ static struct ldlm_resource *ldlm_resource_new(enum ldlm_type ldlm_type)
 {
 	struct ldlm_resource *res;
 
-	res = kmem_cache_zalloc(ldlm_resource_slab, GFP_NOFS);
+	res = kmem_cache_alloc(ldlm_resource_slab, GFP_NOFS);
 	if (!res)
 		return NULL;
 
 	INIT_LIST_HEAD(&res->lr_granted);
 	INIT_LIST_HEAD(&res->lr_waiting);
+	res->lr_lvb_inode = NULL;
+	res->lr_lvb_len = 0;
 
 	if (ldlm_type == LDLM_EXTENT) {
 		int idx;
@@ -1017,16 +1019,15 @@ static struct ldlm_resource *ldlm_resource_new(enum ldlm_type ldlm_type)
 			res->lr_itree[idx].lit_mode = 1 << idx;
 			res->lr_itree[idx].lit_root = RB_ROOT_CACHED;
 		}
-	}
+	} else
+		res->lr_itree = NULL;
 
 	atomic_set(&res->lr_refcount, 1);
-	spin_lock_init(&res->lr_lock);
 	lu_ref_init(&res->lr_reference);
 
 	/* The creator of the resource must unlock the mutex after LVB
 	 * initialization.
 	 */
-	mutex_init(&res->lr_lvb_mutex);
 	mutex_lock(&res->lr_lvb_mutex);
 
 	return res;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 28/29] lustre: ldlm: don't access l_resource when not locked.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (26 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH 27/29] lustre: ldlm: discard l_lock from struct ldlm_lock James Simmons
@ 2019-05-20 12:51 ` James Simmons
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 29/29] lustre: ldlm: drop SLAB_TYPESAFE_BY_RCU from ldlm_lock slab James Simmons
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

lock->l_resource can (sometimes) change when the resource
isn't locked.
So dereferencing lock->l_resource and the locking the
resource looks wrong.
As lock_res_and_lock() returns the locked resource, this
code can easily be more obviously correct by using
that return value.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/ldlm/ldlm_lock.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lock.c b/fs/lustre/ldlm/ldlm_lock.c
index e62dad1..830d089 100644
--- a/fs/lustre/ldlm/ldlm_lock.c
+++ b/fs/lustre/ldlm/ldlm_lock.c
@@ -425,13 +425,13 @@ static struct ldlm_lock *ldlm_lock_new(struct ldlm_resource *resource)
 int ldlm_lock_change_resource(struct ldlm_namespace *ns, struct ldlm_lock *lock,
 			      const struct ldlm_res_id *new_resid)
 {
-	struct ldlm_resource *oldres = lock->l_resource;
+	struct ldlm_resource *oldres;
 	struct ldlm_resource *newres;
 	int type;
 
-	lock_res_and_lock(lock);
-	if (memcmp(new_resid, &lock->l_resource->lr_name,
-		   sizeof(lock->l_resource->lr_name)) == 0) {
+	oldres = lock_res_and_lock(lock);
+	if (memcmp(new_resid, &oldres->lr_name,
+		   sizeof(oldres->lr_name)) == 0) {
 		/* Nothing to do */
 		unlock_res_and_lock(lock);
 		return 0;
@@ -1573,9 +1573,9 @@ enum ldlm_error ldlm_lock_enqueue(struct ldlm_namespace *ns,
 				  void *cookie, u64 *flags)
 {
 	struct ldlm_lock *lock = *lockp;
-	struct ldlm_resource *res = lock->l_resource;
+	struct ldlm_resource *res;
 
-	lock_res_and_lock(lock);
+	res = lock_res_and_lock(lock);
 	if (lock->l_req_mode == lock->l_granted_mode) {
 		/* The server returned a blocked lock, but it was granted
 		 * before we got a chance to actually enqueue it.  We don't
@@ -1874,9 +1874,8 @@ void ldlm_lock_cancel(struct ldlm_lock *lock)
 	struct ldlm_resource *res;
 	struct ldlm_namespace *ns;
 
-	lock_res_and_lock(lock);
+	res = lock_res_and_lock(lock);
 
-	res = lock->l_resource;
 	ns  = ldlm_res_to_ns(res);
 
 	/* Please do not, no matter how tempting, remove this LBUG without
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 29/29] lustre: ldlm: drop SLAB_TYPESAFE_BY_RCU from ldlm_lock slab.
  2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
                   ` (27 preceding siblings ...)
  2019-05-20 12:51 ` [lustre-devel] [PATCH v2 28/29] lustre: ldlm: don't access l_resource when not locked James Simmons
@ 2019-05-20 12:51 ` James Simmons
  28 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-20 12:51 UTC (permalink / raw)
  To: lustre-devel

From: NeilBrown <neilb@suse.com>

ldlm_locks are always freed with kfree_rcu(), so there is
no need for the underlying pages to also be freed after
a grace period.  So remove this flag, it is not useful.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/ldlm/ldlm_lockd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c
index ea87fd6..4a15065 100644
--- a/fs/lustre/ldlm/ldlm_lockd.c
+++ b/fs/lustre/ldlm/ldlm_lockd.c
@@ -1129,8 +1129,7 @@ int ldlm_init(void)
 
 	ldlm_lock_slab = kmem_cache_create("ldlm_locks",
 					   sizeof(struct ldlm_lock), 0,
-					   SLAB_HWCACHE_ALIGN |
-					   SLAB_TYPESAFE_BY_RCU, NULL);
+					   SLAB_HWCACHE_ALIGN, NULL);
 	if (!ldlm_lock_slab)
 		goto out_resource;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes James Simmons
@ 2019-05-22  3:54   ` NeilBrown
  2019-05-22 12:48     ` Patrick Farrell
  2019-05-22 19:06     ` James Simmons
  0 siblings, 2 replies; 44+ messages in thread
From: NeilBrown @ 2019-05-22  3:54 UTC (permalink / raw)
  To: lustre-devel

On Mon, May 20 2019, James Simmons wrote:

> From: Patrick Farrell <pfarrell@whamcloud.com>
>
> Various error conditions in the fault path can cause us to
> not return a page in vm_fault.  Check if it's present
> before accessing it.

I cannot find any error conditions that would leave ->page NULL,
but that wouldn't set one of
  VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED
in 'result'.

Can someone provide an example?

(I have seen crashes with vmf->page being NULL, but they were caused
 by VM_FAULT_RETRY being #defined to 0 as lustre/llite/llite_internal.h
 still does on OpenSFS lustre)

>
> Additionally, it's not valid to return VM_FAULT_NOPAGE for
> page faults.  The correct return when accessing a page that
> does not exist is VM_FAULT_SIGBUS.  Correcting this avoids
> looping infinitely in the testcase.

I agree with that.  VM_FAULT_NOPAGE is valid for page_mkwrite - and
ll_page_mkwrite() has separate code to choose VM_FAULT_NOPAGE.
So the change to to_fault_error() is valid.

NeilBrown


>
> Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
> Reviewed-on: https://review.whamcloud.com/34247
> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
> Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  fs/lustre/llite/llite_mmap.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
> index 1865db1..c8e57ad 100644
> --- a/fs/lustre/llite/llite_mmap.c
> +++ b/fs/lustre/llite/llite_mmap.c
> @@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
>  	case 0:
>  		result = VM_FAULT_LOCKED;
>  		break;
> -	case -EFAULT:
> -		result = VM_FAULT_NOPAGE;
> -		break;
>  	case -ENOMEM:
>  		result = VM_FAULT_OOM;
>  		break;
> @@ -366,7 +363,8 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
>  
>  restart:
>  	result = __ll_fault(vmf->vma, vmf);
> -	if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
> +	if (vmf->page &&
> +	    !(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
>  		struct page *vmpage = vmf->page;
>  
>  		/* check if this page has been truncated */
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/29599858/attachment.sig>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables()
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables() James Simmons
@ 2019-05-22  4:22   ` NeilBrown
  2019-05-22 18:58     ` James Simmons
  0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2019-05-22  4:22 UTC (permalink / raw)
  To: lustre-devel

On Mon, May 20 2019, James Simmons wrote:

> llite is very different from the other types of lustre devices.
> Since this is the case llite should register independently. Doing
> this allows us to cleanup the debugfs registering in the release
> function of struct kobj_type.
>
> Signed-off-by: James Simmons <uja.ornl@yahoo.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-8066
> Reviewed-on: https://review.whamcloud.com/34292
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> Reviewed-by: Ben Evans <bevans@cray.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  fs/lustre/llite/lproc_llite.c | 41 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
> index 5de8462..91b0a50 100644
> --- a/fs/lustre/llite/lproc_llite.c
> +++ b/fs/lustre/llite/lproc_llite.c
> @@ -42,18 +42,40 @@
>  static struct kobject *llite_kobj;
>  static struct dentry *llite_root;
>  
> +static void llite_kobj_release(struct kobject *kobj)
> +{
> +	if (!IS_ERR_OR_NULL(llite_root)) {
> +		debugfs_remove(llite_root);
> +		llite_root = NULL;
> +	}
> +
> +	kfree(kobj);
> +}
> +
> +static struct kobj_type llite_kobj_ktype = {
> +	.release	= llite_kobj_release,
> +	.sysfs_ops	= &lustre_sysfs_ops,
> +};
> +
>  int llite_tunables_register(void)
>  {
> -	int rc = 0;
> +	int rc;
> +
> +	llite_kobj = kzalloc(sizeof(*llite_kobj), GFP_KERNEL);
> +	if (!llite_kobj)
> +		return -ENOMEM;
>  
> -	llite_kobj = class_setup_tunables("llite");
> -	if (IS_ERR(llite_kobj))
> -		return PTR_ERR(llite_kobj);
> +	llite_kobj->kset = lustre_kset;
> +	rc = kobject_init_and_add(llite_kobj, &llite_kobj_ktype,
> +				  &lustre_kset->kobj, "%s", "llite");
> +	if (rc)
> +		goto free_kobj;
>  
>  	llite_root = debugfs_create_dir("llite", debugfs_lustre_root);
>  	if (IS_ERR_OR_NULL(llite_root)) {
>  		rc = llite_root ? PTR_ERR(llite_root) : -ENOMEM;
>  		llite_root = NULL;
> +free_kobj:
>  		kobject_put(llite_kobj);
>  		llite_kobj = NULL;

eeek.  Goto into the inside of an if/then clause is .... not my
favourite.

> +	rc = kobject_init_and_add(llite_kobj, &llite_kobj_ktype,
> +				  &lustre_kset->kobj, "%s", "llite");
> +	if (rc)
> +		goto free_kobj;
>  
>  	llite_root = debugfs_create_dir("llite", debugfs_lustre_root);
>  	if (IS_ERR_OR_NULL(llite_root)) {
>  		rc = llite_root ? PTR_ERR(llite_root) : -ENOMEM;
>  		llite_root = NULL;
                goto free_kobj;
        }
        return 0;
> +free_kobj:
> 	kobject_put(llite_kobj);
> 	llite_kobj = NULL;
        return rc;

Isn't that much cleaner?

Otherwise, I like the patch.

Thanks,
NeilBrown


>  	}
> @@ -65,9 +87,6 @@ void llite_tunables_unregister(void)
>  {
>  	kobject_put(llite_kobj);
>  	llite_kobj = NULL;
> -
> -	debugfs_remove(llite_root);
> -	llite_root = NULL;
>  }
>  
>  /* debugfs llite mount point registration */
> @@ -1216,17 +1235,17 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file,
>  	NULL,
>  };
>  
> -static void llite_kobj_release(struct kobject *kobj)
> +static void sbi_kobj_release(struct kobject *kobj)
>  {
>  	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
>  					      ll_kset.kobj);
>  	complete(&sbi->ll_kobj_unregister);
>  }
>  
> -static struct kobj_type llite_ktype = {
> +static struct kobj_type sbi_ktype = {
>  	.default_attrs	= llite_attrs,
>  	.sysfs_ops	= &lustre_sysfs_ops,
> -	.release	= llite_kobj_release,
> +	.release	= sbi_kobj_release,
>  };
>  
>  static const struct llite_file_opcode {
> @@ -1384,7 +1403,7 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
>  out_ll_kset:
>  	/* Yes we also register sysfs mount kset here as well */
>  	sbi->ll_kset.kobj.parent = llite_kobj;
> -	sbi->ll_kset.kobj.ktype = &llite_ktype;
> +	sbi->ll_kset.kobj.ktype = &sbi_ktype;
>  	init_completion(&sbi->ll_kobj_unregister);
>  	err = kobject_set_name(&sbi->ll_kset.kobj, "%s", name);
>  	if (err)
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/757f908a/attachment.sig>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type James Simmons
@ 2019-05-22  5:20   ` NeilBrown
  0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2019-05-22  5:20 UTC (permalink / raw)
  To: lustre-devel

On Mon, May 20 2019, James Simmons wrote:

> From: NeilBrown <neilb@suse.com>
>
> As there is a 1-1 mapping between obd_types and their ->typ_kobj, it
> is simple and more normal to embed the kobj in the obd_type, rather
> than allocate it separately.
>
> This requires calling "kobject_init_and_add()" earlier, so we
> open-code relevant part of class_setup_tunables() in
> class_register_type(). Now class_setup_tunables() is needed only
> for server side code.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/lustre/include/obd.h             |  2 +-
>  fs/lustre/include/obd_class.h       |  1 -
>  fs/lustre/obdclass/genops.c         | 56 +++++++++++--------------------------
>  fs/lustre/obdclass/lprocfs_status.c |  2 +-
>  4 files changed, 19 insertions(+), 42 deletions(-)
>
> diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
> index f626695..f20c246 100644
> --- a/fs/lustre/include/obd.h
> +++ b/fs/lustre/include/obd.h
> @@ -107,7 +107,7 @@ struct obd_type {
>  	int			 typ_refcnt;
>  	struct lu_device_type	*typ_lu;
>  	spinlock_t		 obd_type_lock;
> -	struct kobject		*typ_kobj;
> +	struct kobject		 typ_kobj;
>  };
>  
>  struct brw_page {
> diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
> index e4cde19..1729865 100644
> --- a/fs/lustre/include/obd_class.h
> +++ b/fs/lustre/include/obd_class.h
> @@ -60,7 +60,6 @@
>  
>  /* genops.c */
>  struct obd_export *class_conn2export(struct lustre_handle *conn);
> -struct kobject *class_setup_tunables(const char *name);
>  int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  			const char *name, struct lu_device_type *ldt);
>  int class_unregister_type(const char *name);
> diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
> index bc1f979..ccd8a42 100644
> --- a/fs/lustre/obdclass/genops.c
> +++ b/fs/lustre/obdclass/genops.c
> @@ -136,7 +136,9 @@ void class_put_type(struct obd_type *type)
>  
>  static void class_sysfs_release(struct kobject *kobj)
>  {
> -	kfree(kobj);
> +	struct obd_type *type = container_of(kobj, struct obd_type, typ_kobj);
> +
> +	kfree(type);
>  }
>  
>  static struct kobj_type class_ktype = {
> @@ -144,26 +146,6 @@ static void class_sysfs_release(struct kobject *kobj)
>  	.release	= class_sysfs_release,
>  };
>  
> -struct kobject *class_setup_tunables(const char *name)
> -{
> -	struct kobject *kobj;
> -	int rc;
> -
> -	kobj = kzalloc(sizeof(*kobj), GFP_KERNEL);
> -	if (!kobj)
> -		return ERR_PTR(-ENOMEM);
> -
> -	kobj->kset = lustre_kset;
> -	kobject_init(kobj, &class_ktype);
> -	rc = kobject_add(kobj, &lustre_kset->kobj, "%s", name);
> -	if (rc) {
> -		kobject_put(kobj);
> -		return ERR_PTR(rc);
> -	}
> -	return kobj;
> -}
> -EXPORT_SYMBOL(class_setup_tunables);
> -
>  #define CLASS_MAX_NAME 1024
>  
>  int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
> @@ -181,10 +163,17 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  		return -EEXIST;
>  	}
>  
> -	rc = -ENOMEM;
>  	type = kzalloc(sizeof(*type), GFP_NOFS);
>  	if (!type)
> -		return rc;
> +		return -ENOMEM;
> +
> +	type->typ_kobj.kset = lustre_kset;
> +	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
> +				  &lustre_kset->kobj, "%s", name);

I don't know that I would actually cause a problem, but I don't like
"add"ing and object (above) before fully initializing it (below).  So
I've kept the split from my version where kobject_init() happens early
and kobject_add() happens later.
I've included the other changes that you made.

Thanks,
NeilBrown


> +	if (rc)
> +		goto failed;
> +
> +	type->typ_debugfs_entry = debugfs_create_dir(name, debugfs_lustre_root);
>  
>  	type->typ_dt_ops = kzalloc(sizeof(*type->typ_dt_ops), GFP_NOFS);
>  	type->typ_md_ops = kzalloc(sizeof(*type->typ_md_ops), GFP_NOFS);
> @@ -202,22 +191,11 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  	strcpy(type->typ_name, name);
>  	spin_lock_init(&type->obd_type_lock);
>  
> -	type->typ_debugfs_entry = debugfs_create_dir(type->typ_name,
> -						     debugfs_lustre_root);
> -
> -	type->typ_kobj = class_setup_tunables(type->typ_name);
> -	if (IS_ERR(type->typ_kobj)) {
> -		rc = PTR_ERR(type->typ_kobj);
> -		goto failed;
> -	}
> -
>  	if (ldt) {
>  		type->typ_lu = ldt;
>  		rc = lu_device_type_init(ldt);
> -		if (rc != 0) {
> -			kobject_put(type->typ_kobj);
> +		if (rc != 0)
>  			goto failed;
> -		}
>  	}
>  
>  	spin_lock(&obd_types_lock);
> @@ -230,7 +208,8 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  	kfree(type->typ_name);
>  	kfree(type->typ_md_ops);
>  	kfree(type->typ_dt_ops);
> -	kfree(type);
> +	kobject_put(&type->typ_kobj);
> +
>  	return rc;
>  }
>  EXPORT_SYMBOL(class_register_type);
> @@ -253,8 +232,6 @@ int class_unregister_type(const char *name)
>  		return -EBUSY;
>  	}
>  
> -	kobject_put(type->typ_kobj);
> -
>  	debugfs_remove_recursive(type->typ_debugfs_entry);
>  
>  	if (type->typ_lu)
> @@ -266,7 +243,8 @@ int class_unregister_type(const char *name)
>  	kfree(type->typ_name);
>  	kfree(type->typ_dt_ops);
>  	kfree(type->typ_md_ops);
> -	kfree(type);
> +	kobject_put(&type->typ_kobj);
> +
>  	return 0;
>  } /* class_unregister_type */
>  EXPORT_SYMBOL(class_unregister_type);
> diff --git a/fs/lustre/obdclass/lprocfs_status.c b/fs/lustre/obdclass/lprocfs_status.c
> index 11fddc8..71bf409 100644
> --- a/fs/lustre/obdclass/lprocfs_status.c
> +++ b/fs/lustre/obdclass/lprocfs_status.c
> @@ -1036,7 +1036,7 @@ int lprocfs_obd_setup(struct obd_device *obd, bool uuid_only)
>  	obd->obd_ktype.sysfs_ops = &lustre_sysfs_ops;
>  	obd->obd_ktype.release = obd_sysfs_release;
>  
> -	obd->obd_kset.kobj.parent = obd->obd_type->typ_kobj;
> +	obd->obd_kset.kobj.parent = &obd->obd_type->typ_kobj;
>  	obd->obd_kset.kobj.ktype = &obd->obd_ktype;
>  	init_completion(&obd->obd_kobj_unregister);
>  	rc = kset_register(&obd->obd_kset);
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/9687968e/attachment.sig>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type.
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type James Simmons
@ 2019-05-22  6:49   ` NeilBrown
  2019-05-22 18:51   ` James Simmons
  1 sibling, 0 replies; 44+ messages in thread
From: NeilBrown @ 2019-05-22  6:49 UTC (permalink / raw)
  To: lustre-devel

On Mon, May 20 2019, James Simmons wrote:

> From: NeilBrown <neilb@suse.com>
>
> Now that obj_type is managed as a kobject, move all
> the freeing and deregistering into class_sysfs_release().
> Only leave type->typ_lu handling since unloading obdecho
> can trigger an assert.
>
> lu_context_key_degister()) ASSERTION( atomic_read(&key->lct_used) >= 1 ) failed:
> lu_context_key_degister()) LBUG
> kernel: Pid: 10642, comm: rmmod
> Call Trace:
> [<ffffffffc096e7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
> [<ffffffffc096e87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
> [<ffffffffc0f9761c>] lu_context_key_degister+0x14c/0x160 [obdclass]
> [<ffffffffc0f977d2>] lu_context_key_degister_many+0x72/0xb0 [obdclass]
> [<ffffffffc13e7130>] echo_type_fini+0x20/0x30 [obdecho]
> [<ffffffffc0f9618b>] lu_device_type_fini+0x1b/0x20 [obdclass]
> [<ffffffffc0f67fce>] class_sysfs_release+0x3e/0x6b0 [obdclass]
> [<ffffffffb85782a1>] kobject_release+0x81/0x1b0
> [<ffffffffb8578138>] kobject_put+0x28/0x60
> [<ffffffffc0f6940c>] class_unregister_type+0x23c/0x550 [obdclass]
> [<ffffffffc13f9636>] obdecho_exit+0x10/0x9da [obdecho]

I cannot reproduce this, and the change you suggest to fix is seems only
tangentially related to the symptom.

So I'm going to keep the lu_device_type_fini call in
class_sysfs_release().

If it happens again, I'd love to hear of it - even more so if you can
reproduce.

The most likely cause of the assertion failure is that  echo_type_fini
gets called twice.

Prior to
  Commit ef84c0736421 ("staging: lustre: use wait_event_var() in lu_context_key_degister()")

this would not have been fatal.  Now it is.  Maybe that is the cause of
this failed assertion.

Thanks,
NeilBrown


>
> Reviewed-by: James Simmons <jsimmons@infradead.org>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/lustre/obdclass/genops.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/fs/lustre/obdclass/genops.c b/fs/lustre/obdclass/genops.c
> index ccd8a42..2a5ec93 100644
> --- a/fs/lustre/obdclass/genops.c
> +++ b/fs/lustre/obdclass/genops.c
> @@ -138,6 +138,15 @@ static void class_sysfs_release(struct kobject *kobj)
>  {
>  	struct obd_type *type = container_of(kobj, struct obd_type, typ_kobj);
>  
> +	debugfs_remove_recursive(type->typ_debugfs_entry);
> +
> +	spin_lock(&obd_types_lock);
> +	list_del(&type->typ_chain);
> +	spin_unlock(&obd_types_lock);
> +
> +	kfree(type->typ_name);
> +	kfree(type->typ_md_ops);
> +	kfree(type->typ_dt_ops);
>  	kfree(type);
>  }
>  
> @@ -167,6 +176,7 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  	if (!type)
>  		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&type->typ_chain);
>  	type->typ_kobj.kset = lustre_kset;
>  	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
>  				  &lustre_kset->kobj, "%s", name);
> @@ -205,9 +215,6 @@ int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>  	return 0;
>  
>  failed:
> -	kfree(type->typ_name);
> -	kfree(type->typ_md_ops);
> -	kfree(type->typ_dt_ops);
>  	kobject_put(&type->typ_kobj);
>  
>  	return rc;
> @@ -232,17 +239,9 @@ int class_unregister_type(const char *name)
>  		return -EBUSY;
>  	}
>  
> -	debugfs_remove_recursive(type->typ_debugfs_entry);
> -
>  	if (type->typ_lu)
>  		lu_device_type_fini(type->typ_lu);
>  
> -	spin_lock(&obd_types_lock);
> -	list_del(&type->typ_chain);
> -	spin_unlock(&obd_types_lock);
> -	kfree(type->typ_name);
> -	kfree(type->typ_dt_ops);
> -	kfree(type->typ_md_ops);
>  	kobject_put(&type->typ_kobj);
>  
>  	return 0;
> -- 
> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/9ca714c5/attachment.sig>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock James Simmons
@ 2019-05-22  6:53   ` NeilBrown
  2019-05-22 19:24     ` James Simmons
  0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2019-05-22  6:53 UTC (permalink / raw)
  To: lustre-devel

On Mon, May 20 2019, James Simmons wrote:

> From: NeilBrown <neilb@suse.com>
>
> This lock is only used to protect typ_refcnt, so change
> that to an atomic_t and discard the lock.
>
> The lock also covers calls to try_module_get and module_put,
> but this serves no purpose as it does not prevent the module
> from being unloaded.
>
> Finally, the return value for the call to try_module_get is
> ignored, which is not safe.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  fs/lustre/include/obd.h         |  3 +--
>  fs/lustre/include/obd_class.h   |  1 -
>  fs/lustre/mdc/mdc_request.c     |  2 +-
>  fs/lustre/mgc/mgc_request.c     |  7 -------
>  fs/lustre/obdclass/genops.c     | 29 ++++++++++++++---------------
>  fs/lustre/obdclass/lu_object.c  |  2 +-
>  fs/lustre/obdclass/obd_config.c | 19 -------------------
>  7 files changed, 17 insertions(+), 46 deletions(-)
>
> diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
> index 4c58b91..61fb815 100644
> --- a/fs/lustre/include/obd.h
> +++ b/fs/lustre/include/obd.h
> @@ -102,9 +102,8 @@ struct obd_type {
>  	struct obd_ops		*typ_dt_ops;
>  	struct md_ops		*typ_md_ops;
>  	struct dentry		*typ_debugfs_entry;
> -	int			 typ_refcnt;
> +	atomic_t		 typ_refcnt;
>  	struct lu_device_type	*typ_lu;
> -	spinlock_t		 obd_type_lock;
>  	struct kobject		 typ_kobj;
>  };
>  #define typ_name typ_kobj.name
> diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
> index 742cb9a4..a853ed5 100644
> --- a/fs/lustre/include/obd_class.h
> +++ b/fs/lustre/include/obd_class.h
> @@ -210,7 +210,6 @@ struct lustre_profile {
>  struct lustre_profile *class_get_profile(const char *prof);
>  void class_del_profile(const char *prof);
>  void class_put_profile(struct lustre_profile *lprof);
> -void class_del_profiles(void);
>  
>  #if LUSTRE_TRACKS_LOCK_EXP_REFS
>  
> diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
> index bc764f9..705a4e3 100644
> --- a/fs/lustre/mdc/mdc_request.c
> +++ b/fs/lustre/mdc/mdc_request.c
> @@ -2542,7 +2542,7 @@ static int mdc_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize)
>  static int mdc_precleanup(struct obd_device *obd)
>  {
>  	/* Failsafe, ok if racy */
> -	if (obd->obd_type->typ_refcnt <= 1)
> +	if (atomic_read(&obd->obd_type->typ_refcnt) <= 1)
>  		libcfs_kkuc_group_rem(0, KUC_GRP_HSM);
>  
>  	mdc_changelog_cdev_finish(obd);
> diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
> index 84ba6d0..d8be54d 100644
> --- a/fs/lustre/mgc/mgc_request.c
> +++ b/fs/lustre/mgc/mgc_request.c
> @@ -712,13 +712,6 @@ static int mgc_precleanup(struct obd_device *obd)
>  
>  static int mgc_cleanup(struct obd_device *obd)
>  {
> -	/* COMPAT_146 - old config logs may have added profiles we don't
> -	 * know about
> -	 */
> -	if (obd->obd_type->typ_refcnt <= 1)
> -		/* Only for the last mgc */
> -		class_del_profiles();
> -

You didn't add any text to the change log describing this change!!!!

Could you please post it as a separate patch - it is only tangentially
related to the changes in this patch.

Thanks,
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/2283efe8/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-22  3:54   ` NeilBrown
@ 2019-05-22 12:48     ` Patrick Farrell
  2019-05-22 23:26       ` NeilBrown
  2019-05-22 19:06     ` James Simmons
  1 sibling, 1 reply; 44+ messages in thread
From: Patrick Farrell @ 2019-05-22 12:48 UTC (permalink / raw)
  To: lustre-devel

Neil,

If you can?t find any, I imagine they don?t exist, at least in your branch, given that difference you cited.

The particular case we had is here:
https://jira.whamcloud.com/browse/LU-11403

Which is when the file exists but has no striping info, and hence no data.

- Patrick
________________________________
From: NeilBrown <neilb@suse.com>
Sent: Tuesday, May 21, 2019 10:54:37 PM
To: James Simmons; Andreas Dilger; Oleg Drokin
Cc: Lustre Development List; Patrick Farrell; James Simmons
Subject: Re: [PATCH v2 01/29] lustre: llite: ll_fault fixes

On Mon, May 20 2019, James Simmons wrote:

> From: Patrick Farrell <pfarrell@whamcloud.com>
>
> Various error conditions in the fault path can cause us to
> not return a page in vm_fault.  Check if it's present
> before accessing it.

I cannot find any error conditions that would leave ->page NULL,
but that wouldn't set one of
  VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED
in 'result'.

Can someone provide an example?

(I have seen crashes with vmf->page being NULL, but they were caused
 by VM_FAULT_RETRY being #defined to 0 as lustre/llite/llite_internal.h
 still does on OpenSFS lustre)

>
> Additionally, it's not valid to return VM_FAULT_NOPAGE for
> page faults.  The correct return when accessing a page that
> does not exist is VM_FAULT_SIGBUS.  Correcting this avoids
> looping infinitely in the testcase.

I agree with that.  VM_FAULT_NOPAGE is valid for page_mkwrite - and
ll_page_mkwrite() has separate code to choose VM_FAULT_NOPAGE.
So the change to to_fault_error() is valid.

NeilBrown


>
> Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
> WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
> Reviewed-on: https://review.whamcloud.com/34247
> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
> Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
> Reviewed-by: Oleg Drokin <green@whamcloud.com>
> Signed-off-by: James Simmons <jsimmons@infradead.org>
> ---
>  fs/lustre/llite/llite_mmap.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
> index 1865db1..c8e57ad 100644
> --- a/fs/lustre/llite/llite_mmap.c
> +++ b/fs/lustre/llite/llite_mmap.c
> @@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
>        case 0:
>                result = VM_FAULT_LOCKED;
>                break;
> -     case -EFAULT:
> -             result = VM_FAULT_NOPAGE;
> -             break;
>        case -ENOMEM:
>                result = VM_FAULT_OOM;
>                break;
> @@ -366,7 +363,8 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
>
>  restart:
>        result = __ll_fault(vmf->vma, vmf);
> -     if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
> +     if (vmf->page &&
> +         !(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
>                struct page *vmpage = vmf->page;
>
>                /* check if this page has been truncated */
> --
> 1.8.3.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190522/58f60308/attachment.html>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type.
  2019-05-20 12:50 ` [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type James Simmons
  2019-05-22  6:49   ` NeilBrown
@ 2019-05-22 18:51   ` James Simmons
  2019-05-22 22:07     ` Andreas Dilger
  1 sibling, 1 reply; 44+ messages in thread
From: James Simmons @ 2019-05-22 18:51 UTC (permalink / raw)
  To: lustre-devel


>>  int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>> @@ -181,10 +163,17 @@ int class_register_type(struct obd_ops *dt_ops, 
>struct md_ops *md_ops,
>>  		return -EEXIST;
>>  	}
>>  
>> -	rc = -ENOMEM;
>>  	type = kzalloc(sizeof(*type), GFP_NOFS);
>>  	if (!type)
>> -		return rc;
>> +		return -ENOMEM;
>> +
>> +	type->typ_kobj.kset = lustre_kset;
>> +	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
>> +				  &lustre_kset->kobj, "%s", name);
>
>I don't know that I would actually cause a problem, but I don't like
>"add"ing and object (above) before fully initializing it (below).  So
>I've kept the split from my version where kobject_init() happens early
>and kobject_add() happens later.
>I've included the other changes that you made.
>
>Thanks,
>NeilBrown

The reason I did it that way was to handle the server case down the road.
So for the case when both client and server are on the same node, and yes
people do such setups for testing this is important.

Consider the case we have both the lov and lod layer on a single node. 
Both layers attempt to create "lov" obd_type. When the lov module loads 
first then a complete obd_type is created. Once lod loads then it just 
uses real lov obd_type and creates the needed symlinks. You end up with

ls -al /sys/fs/lustre/lov/
total 0
drwxr-xr-x  2 root root 0 May 22 14:27 .
drwxr-xr-x 14 root root 0 May 22 14:27 ..
lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0000-mdtlov -> 
../lod/lustre-MDT0000-mdtlov
lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0002-mdtlov -> 
../lod/lustre-MDT0002-mdtlov

Plus the real lov obd devices.

Now if lod loads first then the lod module creates a "lov" obd_type
using class_create_symlink() but is not fully initialized nor does
it need to be. If the client lov module is present and it loads then
it takes the "lov" obd_type create by the lod and finishes initializing
it.

Looking at the final code and added in server case:

        type = class_search_type(name);
        if (type) {
                kobject_put(&type->typ_kobj);
		if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
                    strcmp(name, LUSTRE_OSC_NAME) == 0)
                        goto dir_exist;
		CDEBUG(D_IOCTL, "Type %s already registered\n", name);
                return -EEXIST;
        }

        type = kzalloc(sizeof(*type), GFP_NOFS);
        if (!type)
                return rc;

        type->typ_kobj.kset = lustre_kset;
        kobject_init(&type->typ_kobj, &class_ktype);

        type->typ_dt_ops = dt_ops;  /* lov obd_type is never set to the 
				     * correct values if lod created it.
				     */
        type->typ_md_ops = md_ops;

        rc = kobject_add(&type->typ_kobj, &lustre_kset->kobj, "%s", name);
        if (rc)
                goto failed;

        type->typ_debugfs_entry = debugfs_create_dir(type->typ_name,
                                                     debugfs_lustre_root);
dir_exit:

Now if this needed for the later module handling patches I guess we could
do:

	if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
            strcmp(name, LUSTRE_OSC_NAME) == 0) {
		type->typ_dt_ops = dt_ops;
		type->typ_md_ops = md_ops;
		goto dir_exist;
	}

Is that the case?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables()
  2019-05-22  4:22   ` NeilBrown
@ 2019-05-22 18:58     ` James Simmons
  0 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-22 18:58 UTC (permalink / raw)
  To: lustre-devel


> On Mon, May 20 2019, James Simmons wrote:
> 
> > llite is very different from the other types of lustre devices.
> > Since this is the case llite should register independently. Doing
> > this allows us to cleanup the debugfs registering in the release
> > function of struct kobj_type.
> >
> > Signed-off-by: James Simmons <uja.ornl@yahoo.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-8066
> > Reviewed-on: https://review.whamcloud.com/34292
> > Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> > Reviewed-by: Ben Evans <bevans@cray.com>
> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> > ---
> >  fs/lustre/llite/lproc_llite.c | 41 ++++++++++++++++++++++++++++++-----------
> >  1 file changed, 30 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c
> > index 5de8462..91b0a50 100644
> > --- a/fs/lustre/llite/lproc_llite.c
> > +++ b/fs/lustre/llite/lproc_llite.c
> > @@ -42,18 +42,40 @@
> >  static struct kobject *llite_kobj;
> >  static struct dentry *llite_root;
> >  
> > +static void llite_kobj_release(struct kobject *kobj)
> > +{
> > +	if (!IS_ERR_OR_NULL(llite_root)) {
> > +		debugfs_remove(llite_root);
> > +		llite_root = NULL;
> > +	}
> > +
> > +	kfree(kobj);
> > +}
> > +
> > +static struct kobj_type llite_kobj_ktype = {
> > +	.release	= llite_kobj_release,
> > +	.sysfs_ops	= &lustre_sysfs_ops,
> > +};
> > +
> >  int llite_tunables_register(void)
> >  {
> > -	int rc = 0;
> > +	int rc;
> > +
> > +	llite_kobj = kzalloc(sizeof(*llite_kobj), GFP_KERNEL);
> > +	if (!llite_kobj)
> > +		return -ENOMEM;
> >  
> > -	llite_kobj = class_setup_tunables("llite");
> > -	if (IS_ERR(llite_kobj))
> > -		return PTR_ERR(llite_kobj);
> > +	llite_kobj->kset = lustre_kset;
> > +	rc = kobject_init_and_add(llite_kobj, &llite_kobj_ktype,
> > +				  &lustre_kset->kobj, "%s", "llite");
> > +	if (rc)
> > +		goto free_kobj;
> >  
> >  	llite_root = debugfs_create_dir("llite", debugfs_lustre_root);
> >  	if (IS_ERR_OR_NULL(llite_root)) {
> >  		rc = llite_root ? PTR_ERR(llite_root) : -ENOMEM;
> >  		llite_root = NULL;
> > +free_kobj:
> >  		kobject_put(llite_kobj);
> >  		llite_kobj = NULL;
> 
> eeek.  Goto into the inside of an if/then clause is .... not my
> favourite.
> 
> > +	rc = kobject_init_and_add(llite_kobj, &llite_kobj_ktype,
> > +				  &lustre_kset->kobj, "%s", "llite");
> > +	if (rc)
> > +		goto free_kobj;
> >  
> >  	llite_root = debugfs_create_dir("llite", debugfs_lustre_root);
> >  	if (IS_ERR_OR_NULL(llite_root)) {
> >  		rc = llite_root ? PTR_ERR(llite_root) : -ENOMEM;
> >  		llite_root = NULL;
>                 goto free_kobj;
>         }
>         return 0;
> > +free_kobj:
> > 	kobject_put(llite_kobj);
> > 	llite_kobj = NULL;
>         return rc;
> 
> Isn't that much cleaner?
> 
> Otherwise, I like the patch.

Sure that is fine. Just did it that way since Greg would grip about
how having more than one return in a function is not proper kernel
coding style. So I tend to write code this that way. 

> Thanks,
> NeilBrown
> 
> 
> >  	}
> > @@ -65,9 +87,6 @@ void llite_tunables_unregister(void)
> >  {
> >  	kobject_put(llite_kobj);
> >  	llite_kobj = NULL;
> > -
> > -	debugfs_remove(llite_root);
> > -	llite_root = NULL;
> >  }
> >  
> >  /* debugfs llite mount point registration */
> > @@ -1216,17 +1235,17 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file,
> >  	NULL,
> >  };
> >  
> > -static void llite_kobj_release(struct kobject *kobj)
> > +static void sbi_kobj_release(struct kobject *kobj)
> >  {
> >  	struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info,
> >  					      ll_kset.kobj);
> >  	complete(&sbi->ll_kobj_unregister);
> >  }
> >  
> > -static struct kobj_type llite_ktype = {
> > +static struct kobj_type sbi_ktype = {
> >  	.default_attrs	= llite_attrs,
> >  	.sysfs_ops	= &lustre_sysfs_ops,
> > -	.release	= llite_kobj_release,
> > +	.release	= sbi_kobj_release,
> >  };
> >  
> >  static const struct llite_file_opcode {
> > @@ -1384,7 +1403,7 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name)
> >  out_ll_kset:
> >  	/* Yes we also register sysfs mount kset here as well */
> >  	sbi->ll_kset.kobj.parent = llite_kobj;
> > -	sbi->ll_kset.kobj.ktype = &llite_ktype;
> > +	sbi->ll_kset.kobj.ktype = &sbi_ktype;
> >  	init_completion(&sbi->ll_kobj_unregister);
> >  	err = kobject_set_name(&sbi->ll_kset.kobj, "%s", name);
> >  	if (err)
> > -- 
> > 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-22  3:54   ` NeilBrown
  2019-05-22 12:48     ` Patrick Farrell
@ 2019-05-22 19:06     ` James Simmons
  1 sibling, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-22 19:06 UTC (permalink / raw)
  To: lustre-devel


> > From: Patrick Farrell <pfarrell@whamcloud.com>
> >
> > Various error conditions in the fault path can cause us to
> > not return a page in vm_fault.  Check if it's present
> > before accessing it.
> 
> I cannot find any error conditions that would leave ->page NULL,
> but that wouldn't set one of
>   VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED
> in 'result'.
> 
> Can someone provide an example?

Reproducer is here:

https://review.whamcloud.com/#/c/34247/7/lustre/tests/mmap_mknod_test.c

Just run mmap_mknod_test "file" and it will show this problem.

> (I have seen crashes with vmf->page being NULL, but they were caused
>  by VM_FAULT_RETRY being #defined to 0 as lustre/llite/llite_internal.h
>  still does on OpenSFS lustre)
> 
> >
> > Additionally, it's not valid to return VM_FAULT_NOPAGE for
> > page faults.  The correct return when accessing a page that
> > does not exist is VM_FAULT_SIGBUS.  Correcting this avoids
> > looping infinitely in the testcase.
> 
> I agree with that.  VM_FAULT_NOPAGE is valid for page_mkwrite - and
> ll_page_mkwrite() has separate code to choose VM_FAULT_NOPAGE.
> So the change to to_fault_error() is valid.
> 
> NeilBrown
> 
> 
> >
> > Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
> > WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
> > Reviewed-on: https://review.whamcloud.com/34247
> > Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
> > Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
> > Reviewed-by: Oleg Drokin <green@whamcloud.com>
> > Signed-off-by: James Simmons <jsimmons@infradead.org>
> > ---
> >  fs/lustre/llite/llite_mmap.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
> > index 1865db1..c8e57ad 100644
> > --- a/fs/lustre/llite/llite_mmap.c
> > +++ b/fs/lustre/llite/llite_mmap.c
> > @@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
> >  	case 0:
> >  		result = VM_FAULT_LOCKED;
> >  		break;
> > -	case -EFAULT:
> > -		result = VM_FAULT_NOPAGE;
> > -		break;
> >  	case -ENOMEM:
> >  		result = VM_FAULT_OOM;
> >  		break;
> > @@ -366,7 +363,8 @@ static vm_fault_t ll_fault(struct vm_fault *vmf)
> >  
> >  restart:
> >  	result = __ll_fault(vmf->vma, vmf);
> > -	if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
> > +	if (vmf->page &&
> > +	    !(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {
> >  		struct page *vmpage = vmf->page;
> >  
> >  		/* check if this page has been truncated */
> > -- 
> > 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock
  2019-05-22  6:53   ` NeilBrown
@ 2019-05-22 19:24     ` James Simmons
  0 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-05-22 19:24 UTC (permalink / raw)
  To: lustre-devel


> On Mon, May 20 2019, James Simmons wrote:
> 
> > From: NeilBrown <neilb@suse.com>
> >
> > This lock is only used to protect typ_refcnt, so change
> > that to an atomic_t and discard the lock.
> >
> > The lock also covers calls to try_module_get and module_put,
> > but this serves no purpose as it does not prevent the module
> > from being unloaded.
> >
> > Finally, the return value for the call to try_module_get is
> > ignored, which is not safe.
> >
> > Signed-off-by: NeilBrown <neilb@suse.com>
> > ---
> >  fs/lustre/include/obd.h         |  3 +--
> >  fs/lustre/include/obd_class.h   |  1 -
> >  fs/lustre/mdc/mdc_request.c     |  2 +-
> >  fs/lustre/mgc/mgc_request.c     |  7 -------
> >  fs/lustre/obdclass/genops.c     | 29 ++++++++++++++---------------
> >  fs/lustre/obdclass/lu_object.c  |  2 +-
> >  fs/lustre/obdclass/obd_config.c | 19 -------------------
> >  7 files changed, 17 insertions(+), 46 deletions(-)
> >
> > diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
> > index 4c58b91..61fb815 100644
> > --- a/fs/lustre/include/obd.h
> > +++ b/fs/lustre/include/obd.h
> > @@ -102,9 +102,8 @@ struct obd_type {
> >  	struct obd_ops		*typ_dt_ops;
> >  	struct md_ops		*typ_md_ops;
> >  	struct dentry		*typ_debugfs_entry;
> > -	int			 typ_refcnt;
> > +	atomic_t		 typ_refcnt;
> >  	struct lu_device_type	*typ_lu;
> > -	spinlock_t		 obd_type_lock;
> >  	struct kobject		 typ_kobj;
> >  };
> >  #define typ_name typ_kobj.name
> > diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
> > index 742cb9a4..a853ed5 100644
> > --- a/fs/lustre/include/obd_class.h
> > +++ b/fs/lustre/include/obd_class.h
> > @@ -210,7 +210,6 @@ struct lustre_profile {
> >  struct lustre_profile *class_get_profile(const char *prof);
> >  void class_del_profile(const char *prof);
> >  void class_put_profile(struct lustre_profile *lprof);
> > -void class_del_profiles(void);
> >  
> >  #if LUSTRE_TRACKS_LOCK_EXP_REFS
> >  
> > diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
> > index bc764f9..705a4e3 100644
> > --- a/fs/lustre/mdc/mdc_request.c
> > +++ b/fs/lustre/mdc/mdc_request.c
> > @@ -2542,7 +2542,7 @@ static int mdc_init_ea_size(struct obd_export *exp, u32 easize, u32 def_easize)
> >  static int mdc_precleanup(struct obd_device *obd)
> >  {
> >  	/* Failsafe, ok if racy */
> > -	if (obd->obd_type->typ_refcnt <= 1)
> > +	if (atomic_read(&obd->obd_type->typ_refcnt) <= 1)
> >  		libcfs_kkuc_group_rem(0, KUC_GRP_HSM);
> >  
> >  	mdc_changelog_cdev_finish(obd);
> > diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
> > index 84ba6d0..d8be54d 100644
> > --- a/fs/lustre/mgc/mgc_request.c
> > +++ b/fs/lustre/mgc/mgc_request.c
> > @@ -712,13 +712,6 @@ static int mgc_precleanup(struct obd_device *obd)
> >  
> >  static int mgc_cleanup(struct obd_device *obd)
> >  {
> > -	/* COMPAT_146 - old config logs may have added profiles we don't
> > -	 * know about
> > -	 */
> > -	if (obd->obd_type->typ_refcnt <= 1)
> > -		/* Only for the last mgc */
> > -		class_del_profiles();
> > -
> 
> You didn't add any text to the change log describing this change!!!!
> 
> Could you please post it as a separate patch - it is only tangentially
> related to the changes in this patch.

Sure. Also another patch exist which pretty much makes typ_refcnt only
used in the obd_class layer.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type.
  2019-05-22 18:51   ` James Simmons
@ 2019-05-22 22:07     ` Andreas Dilger
  2019-06-01  0:38       ` James Simmons
  0 siblings, 1 reply; 44+ messages in thread
From: Andreas Dilger @ 2019-05-22 22:07 UTC (permalink / raw)
  To: lustre-devel

On May 22, 2019, at 12:51, James Simmons <jsimmons@infradead.org> wrote:
> 
> 
>>> int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
>>> @@ -181,10 +163,17 @@ int class_register_type(struct obd_ops *dt_ops, 
>> struct md_ops *md_ops,
>>> 		return -EEXIST;
>>> 	}
>>> 
>>> -	rc = -ENOMEM;
>>> 	type = kzalloc(sizeof(*type), GFP_NOFS);
>>> 	if (!type)
>>> -		return rc;
>>> +		return -ENOMEM;
>>> +
>>> +	type->typ_kobj.kset = lustre_kset;
>>> +	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
>>> +				  &lustre_kset->kobj, "%s", name);
>> 
>> I don't know that I would actually cause a problem, but I don't like
>> "add"ing and object (above) before fully initializing it (below).  So
>> I've kept the split from my version where kobject_init() happens early
>> and kobject_add() happens later.
>> I've included the other changes that you made.
>> 
>> Thanks,
>> NeilBrown
> 
> The reason I did it that way was to handle the server case down the road.
> So for the case when both client and server are on the same node, and yes
> people do such setups for testing this is important.
> 
> Consider the case we have both the lov and lod layer on a single node. 
> Both layers attempt to create "lov" obd_type. When the lov module loads 
> first then a complete obd_type is created. Once lod loads then it just 
> uses real lov obd_type and creates the needed symlinks.

We've had the "lov->lod" symlinks on servers since Lustre 2.4 or so (when
osd-zfs was first added).  We could just remove this compatibility, so
long as the test scripts were updated to always use "lod" on the server
and "lov" on the client (previously they shared the same "lov" code module
so the path was the same).  We don't need test script interop going back
further than that, so this should be OK.

Cheers, Andreas

> You end up with
> 
> ls -al /sys/fs/lustre/lov/
> total 0
> drwxr-xr-x  2 root root 0 May 22 14:27 .
> drwxr-xr-x 14 root root 0 May 22 14:27 ..
> lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0000-mdtlov -> 
> ../lod/lustre-MDT0000-mdtlov
> lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0002-mdtlov -> 
> ../lod/lustre-MDT0002-mdtlov
> 
> Plus the real lov obd devices.
> 
> Now if lod loads first then the lod module creates a "lov" obd_type
> using class_create_symlink() but is not fully initialized nor does
> it need to be. If the client lov module is present and it loads then
> it takes the "lov" obd_type create by the lod and finishes initializing
> it.
> 
> Looking at the final code and added in server case:
> 
>        type = class_search_type(name);
>        if (type) {
>                kobject_put(&type->typ_kobj);
> 		if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
>                    strcmp(name, LUSTRE_OSC_NAME) == 0)
>                        goto dir_exist;
> 		CDEBUG(D_IOCTL, "Type %s already registered\n", name);
>                return -EEXIST;
>        }
> 
>        type = kzalloc(sizeof(*type), GFP_NOFS);
>        if (!type)
>                return rc;
> 
>        type->typ_kobj.kset = lustre_kset;
>        kobject_init(&type->typ_kobj, &class_ktype);
> 
>        type->typ_dt_ops = dt_ops;  /* lov obd_type is never set to the 
> 				     * correct values if lod created it.
> 				     */
>        type->typ_md_ops = md_ops;
> 
>        rc = kobject_add(&type->typ_kobj, &lustre_kset->kobj, "%s", name);
>        if (rc)
>                goto failed;
> 
>        type->typ_debugfs_entry = debugfs_create_dir(type->typ_name,
>                                                     debugfs_lustre_root);
> dir_exit:
> 
> Now if this needed for the later module handling patches I guess we could
> do:
> 
> 	if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
>            strcmp(name, LUSTRE_OSC_NAME) == 0) {
> 		type->typ_dt_ops = dt_ops;
> 		type->typ_md_ops = md_ops;
> 		goto dir_exist;
> 	}
> 
> Is that the case?
> 
> 

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-22 12:48     ` Patrick Farrell
@ 2019-05-22 23:26       ` NeilBrown
  2019-05-23  0:13         ` Patrick Farrell
  0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2019-05-22 23:26 UTC (permalink / raw)
  To: lustre-devel

On Wed, May 22 2019, Patrick Farrell wrote:

> Neil,
>
> If you can?t find any, I imagine they don?t exist, at least in your branch, given that difference you cited.
>
> The particular case we had is here:
> https://jira.whamcloud.com/browse/LU-11403
>
> Which is when the file exists but has no striping info, and hence no data.

Hi Patrick,
Thanks for the reply.  Looking at that issue it appears that the root
cause is -EFAULT being returned from lov_io_init_empty, which then got
returned by ll_fault_io_init and then was converted by to_fault_error
into VM_FAULT_NOPAGE.

The test

 	if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {

in ll_fault() doesn't notice VM_FAULT_NOPAGE (it is not part of
VM_FAULT_ERROR), so the 'then' branch is run, which triggers the
problem.

Just changing to_fault_error() to convert  -EFAULT to VM_FAULT_SIGBUS,
as you did, cause the 'if' to do "the right thing", as VM_FAULT_SIGBUS
is one of the flags in VM_FAULT_ERROR.

So the extra test on vmf->page is redundant.  If there has been no
error, and we don't need to retry, then vmf->page *must* exist.


I've changed the commit message to explain this, so what I have now is
below.

Thanks,
NeilBrown

From: Patrick Farrell <pfarrell@whamcloud.com>
Date: Mon, 20 May 2019 08:50:43 -0400
Subject: [PATCH] lustre: llite: ll_fault fix

Error conditions in the fault path such as a fault on a file without
stripes (see lov_io_init_emtpy()) can cause us to
not return a page in vm_fault, and to report -EFAULT.

-EFAULT is currently mapped to VM_FAULT_NOPAGE by to_fault_error(),
and ll_fault doesn't recognize this as an error which might leave
->page unset, and tries to dereference vmf->page.

VM_FAULT_NOPAGE is *not* a valid status for .fault, only for
.page_mkwrite and .pfn_mkwrite.  So change to_fault_error() to
instead map -EFAULT to VM_FAULT_SIGBUS.  This is part of
VM_FAULT_ERROR, so ll_fault will *not* dereference vmf->page when that
error code is set.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
Reviewed-on: https://review.whamcloud.com/34247
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/llite/llite_mmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index 1865db1237ce..59fb40029306 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
 	case 0:
 		result = VM_FAULT_LOCKED;
 		break;
-	case -EFAULT:
-		result = VM_FAULT_NOPAGE;
-		break;
 	case -ENOMEM:
 		result = VM_FAULT_OOM;
 		break;
-- 
2.14.0.rc0.dirty

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190523/22d8361d/attachment.sig>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes
  2019-05-22 23:26       ` NeilBrown
@ 2019-05-23  0:13         ` Patrick Farrell
  0 siblings, 0 replies; 44+ messages in thread
From: Patrick Farrell @ 2019-05-23  0:13 UTC (permalink / raw)
  To: lustre-devel

Neil,

Memory has surfaced here.  There was a not-documented-where-I-can-find-it report of a null pointer on ?page? here that came in around the same time running - I believe - racer (a big messy ?do random things from many threads? test).  No obvious explanation for how we got the problem but I added the check since it was simple enough.

But I?m perfectly comfortable with you removing it and seeing if something occurs again.

- Patrick
________________________________
From: NeilBrown <neilb@suse.com>
Sent: Wednesday, May 22, 2019 6:26:47 PM
To: Patrick Farrell; James Simmons; Andreas Dilger; Oleg Drokin
Cc: Lustre Development List
Subject: Re: [PATCH v2 01/29] lustre: llite: ll_fault fixes

On Wed, May 22 2019, Patrick Farrell wrote:

> Neil,
>
> If you can?t find any, I imagine they don?t exist, at least in your branch, given that difference you cited.
>
> The particular case we had is here:
> https://jira.whamcloud.com/browse/LU-11403
>
> Which is when the file exists but has no striping info, and hence no data.

Hi Patrick,
Thanks for the reply.  Looking at that issue it appears that the root
cause is -EFAULT being returned from lov_io_init_empty, which then got
returned by ll_fault_io_init and then was converted by to_fault_error
into VM_FAULT_NOPAGE.

The test

         if (!(result & (VM_FAULT_RETRY | VM_FAULT_ERROR | VM_FAULT_LOCKED))) {

in ll_fault() doesn't notice VM_FAULT_NOPAGE (it is not part of
VM_FAULT_ERROR), so the 'then' branch is run, which triggers the
problem.

Just changing to_fault_error() to convert  -EFAULT to VM_FAULT_SIGBUS,
as you did, cause the 'if' to do "the right thing", as VM_FAULT_SIGBUS
is one of the flags in VM_FAULT_ERROR.

So the extra test on vmf->page is redundant.  If there has been no
error, and we don't need to retry, then vmf->page *must* exist.


I've changed the commit message to explain this, so what I have now is
below.

Thanks,
NeilBrown

From: Patrick Farrell <pfarrell@whamcloud.com>
Date: Mon, 20 May 2019 08:50:43 -0400
Subject: [PATCH] lustre: llite: ll_fault fix

Error conditions in the fault path such as a fault on a file without
stripes (see lov_io_init_emtpy()) can cause us to
not return a page in vm_fault, and to report -EFAULT.

-EFAULT is currently mapped to VM_FAULT_NOPAGE by to_fault_error(),
and ll_fault doesn't recognize this as an error which might leave
->page unset, and tries to dereference vmf->page.

VM_FAULT_NOPAGE is *not* a valid status for .fault, only for
.page_mkwrite and .pfn_mkwrite.  So change to_fault_error() to
instead map -EFAULT to VM_FAULT_SIGBUS.  This is part of
VM_FAULT_ERROR, so ll_fault will *not* dereference vmf->page when that
error code is set.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
WC-bug-id: https://jira.whamcloud.com/browse/LU-11403
Reviewed-on: https://review.whamcloud.com/34247
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/lustre/llite/llite_mmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c
index 1865db1237ce..59fb40029306 100644
--- a/fs/lustre/llite/llite_mmap.c
+++ b/fs/lustre/llite/llite_mmap.c
@@ -238,9 +238,6 @@ static inline vm_fault_t to_fault_error(int result)
         case 0:
                 result = VM_FAULT_LOCKED;
                 break;
-       case -EFAULT:
-               result = VM_FAULT_NOPAGE;
-               break;
         case -ENOMEM:
                 result = VM_FAULT_OOM;
                 break;
--
2.14.0.rc0.dirty

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20190523/8dded084/attachment-0001.html>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type.
  2019-05-22 22:07     ` Andreas Dilger
@ 2019-06-01  0:38       ` James Simmons
  0 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2019-06-01  0:38 UTC (permalink / raw)
  To: lustre-devel


> >>> int class_register_type(struct obd_ops *dt_ops, struct md_ops *md_ops,
> >>> @@ -181,10 +163,17 @@ int class_register_type(struct obd_ops *dt_ops, 
> >> struct md_ops *md_ops,
> >>> 		return -EEXIST;
> >>> 	}
> >>> 
> >>> -	rc = -ENOMEM;
> >>> 	type = kzalloc(sizeof(*type), GFP_NOFS);
> >>> 	if (!type)
> >>> -		return rc;
> >>> +		return -ENOMEM;
> >>> +
> >>> +	type->typ_kobj.kset = lustre_kset;
> >>> +	rc = kobject_init_and_add(&type->typ_kobj, &class_ktype,
> >>> +				  &lustre_kset->kobj, "%s", name);
> >> 
> >> I don't know that I would actually cause a problem, but I don't like
> >> "add"ing and object (above) before fully initializing it (below).  So
> >> I've kept the split from my version where kobject_init() happens early
> >> and kobject_add() happens later.
> >> I've included the other changes that you made.
> >> 
> >> Thanks,
> >> NeilBrown
> > 
> > The reason I did it that way was to handle the server case down the road.
> > So for the case when both client and server are on the same node, and yes
> > people do such setups for testing this is important.
> > 
> > Consider the case we have both the lov and lod layer on a single node. 
> > Both layers attempt to create "lov" obd_type. When the lov module loads 
> > first then a complete obd_type is created. Once lod loads then it just 
> > uses real lov obd_type and creates the needed symlinks.
> 
> We've had the "lov->lod" symlinks on servers since Lustre 2.4 or so (when
> osd-zfs was first added).  We could just remove this compatibility, so
> long as the test scripts were updated to always use "lod" on the server
> and "lov" on the client (previously they shared the same "lov" code module
> so the path was the same).  We don't need test script interop going back
> further than that, so this should be OK.
> 
> Cheers, Andreas

I ripped off the band aid and boy did it bleed. I going to have to work
out the test suite changes. Well will need to back port the test suite
changes to 2.12 for interop testing.
 
> > You end up with
> > 
> > ls -al /sys/fs/lustre/lov/
> > total 0
> > drwxr-xr-x  2 root root 0 May 22 14:27 .
> > drwxr-xr-x 14 root root 0 May 22 14:27 ..
> > lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0000-mdtlov -> 
> > ../lod/lustre-MDT0000-mdtlov
> > lrwxrwxrwx  1 root root 0 May 22 14:27 lustre-MDT0002-mdtlov -> 
> > ../lod/lustre-MDT0002-mdtlov
> > 
> > Plus the real lov obd devices.
> > 
> > Now if lod loads first then the lod module creates a "lov" obd_type
> > using class_create_symlink() but is not fully initialized nor does
> > it need to be. If the client lov module is present and it loads then
> > it takes the "lov" obd_type create by the lod and finishes initializing
> > it.
> > 
> > Looking at the final code and added in server case:
> > 
> >        type = class_search_type(name);
> >        if (type) {
> >                kobject_put(&type->typ_kobj);
> > 		if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
> >                    strcmp(name, LUSTRE_OSC_NAME) == 0)
> >                        goto dir_exist;
> > 		CDEBUG(D_IOCTL, "Type %s already registered\n", name);
> >                return -EEXIST;
> >        }
> > 
> >        type = kzalloc(sizeof(*type), GFP_NOFS);
> >        if (!type)
> >                return rc;
> > 
> >        type->typ_kobj.kset = lustre_kset;
> >        kobject_init(&type->typ_kobj, &class_ktype);
> > 
> >        type->typ_dt_ops = dt_ops;  /* lov obd_type is never set to the 
> > 				     * correct values if lod created it.
> > 				     */
> >        type->typ_md_ops = md_ops;
> > 
> >        rc = kobject_add(&type->typ_kobj, &lustre_kset->kobj, "%s", name);
> >        if (rc)
> >                goto failed;
> > 
> >        type->typ_debugfs_entry = debugfs_create_dir(type->typ_name,
> >                                                     debugfs_lustre_root);
> > dir_exit:
> > 
> > Now if this needed for the later module handling patches I guess we could
> > do:
> > 
> > 	if (strcmp(name, LUSTRE_LOV_NAME) == 0 ||
> >            strcmp(name, LUSTRE_OSC_NAME) == 0) {
> > 		type->typ_dt_ops = dt_ops;
> > 		type->typ_md_ops = md_ops;
> > 		goto dir_exist;
> > 	}
> > 
> > Is that the case?
> > 
> > 
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-06-01  0:38 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20 12:50 [lustre-devel] [PATCH v2 00/29] More lustre patches James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 01/29] lustre: llite: ll_fault fixes James Simmons
2019-05-22  3:54   ` NeilBrown
2019-05-22 12:48     ` Patrick Farrell
2019-05-22 23:26       ` NeilBrown
2019-05-23  0:13         ` Patrick Farrell
2019-05-22 19:06     ` James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 02/29] lustre: llite: fix error in vvp_pgcache seqfile James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 03/29] lustre: llite: replace lli_trunc_sem James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 04/29] lustre: lov: use GFP_NOFS to allocate lo_entries James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 05/29] lustre: llite: don't use class_setup_tunables() James Simmons
2019-05-22  4:22   ` NeilBrown
2019-05-22 18:58     ` James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 06/29] lustre: embed typ_kobj in obd_type James Simmons
2019-05-22  5:20   ` NeilBrown
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 07/29] lustre: obd: collect all resource releasing for obj_type James Simmons
2019-05-22  6:49   ` NeilBrown
2019-05-22 18:51   ` James Simmons
2019-05-22 22:07     ` Andreas Dilger
2019-06-01  0:38       ` James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 08/29] lustre: obd_type: use typ_kobj.name as typ_name James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 09/29] lustre: obd_type: discard obd_types linked list James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 10/29] lustre: obd_type: discard obd_type_lock James Simmons
2019-05-22  6:53   ` NeilBrown
2019-05-22 19:24     ` James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 11/29] lustre: obdclass: don't copy ops structures in to new type James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 12/29] lustre: obdclass: fix module load locking James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 13/29] lustre: convert rsi_sem to a spinlock James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 14/29] lustre: ldlm: discard varname in ldlm_pool James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 15/29] lustre: lprocfs: use log2.h macros instead of shift loop James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 16/29] lustre: handles: discard h_owner in favour of h_ops James Simmons
2019-05-20 12:50 ` [lustre-devel] [PATCH v2 17/29] lustre: handle: move refcount into the lustre_handle James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 18/29] lustre: discard OBD_FREE_RCU James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 19/29] lustre: portals_handle: rename ops to owner James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 20/29] lustre: portals_handle: remove locking from class_handle2object() James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 21/29] lustre: portals_handle: use hlist for hash lists James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 22/29] lustre: portals_handle: discard h_lock James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 23/29] lustre: remove unused fields from struct obd_device James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 24/29] lustre: obd_sysfs: error-check value stored in jobid_var James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 25/29] lustre: obdclass: discard process_quota_config James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 26/29] lustre: obdclass: remove unnecessary code from lustre_init_lsi() James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH 27/29] lustre: ldlm: discard l_lock from struct ldlm_lock James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 28/29] lustre: ldlm: don't access l_resource when not locked James Simmons
2019-05-20 12:51 ` [lustre-devel] [PATCH v2 29/29] lustre: ldlm: drop SLAB_TYPESAFE_BY_RCU from ldlm_lock slab James Simmons

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.