All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH] lustre: mdc: fix possible deadlock in chlg_open()
@ 2018-10-31  1:29 NeilBrown
  2018-10-31 12:24 ` quentin.bouget at cea.fr
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-10-31  1:29 UTC (permalink / raw)
  To: lustre-devel


Lockdep reports a possible deadlock between chlg_open() and
mdc_changelog_cdev_init()

mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
calls misc_register() which takes misc_mtx.
chlg_open() is called while misc_mtx is held, and tries to take
chlg_registered_dev_lock.
If these two functions race, a deadlock can occur as each thread will
hold one of the locks while trying to take the other.

chlg_open() does not need to take a lock.  It only uses the
lock to stablize a list while looking for the matching
chlg_registered_dev, and this can be found directly by examining
file->private_data.

So remove chlg_obd_get(), and use file->private_Data to find the
obd_device.

Signed-off-by: NeilBrown <neilb@suse.com>
---

Note that this bug exists in upstream (OpenSFS) lustre as well.

 drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 35 +++++------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
index d83507cbf95c..645e7f24a859 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
@@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
 	return rc < 0 ? rc : count;
 }
 
-/**
- * Find the OBD device associated to a changelog character device.
- * @param[in]  cdev  character device instance descriptor
- * @return corresponding OBD device or NULL if none was found.
- */
-static struct obd_device *chlg_obd_get(dev_t cdev)
-{
-	int minor = MINOR(cdev);
-	struct obd_device *obd = NULL;
-	struct chlg_registered_dev *curr;
-
-	mutex_lock(&chlg_registered_dev_lock);
-	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
-		if (curr->ced_misc.minor == minor) {
-			/* take the first available OBD device attached */
-			obd = list_first_entry(&curr->ced_obds,
-					       struct obd_device,
-					       u.cli.cl_chg_dev_linkage);
-			break;
-		}
-	}
-	mutex_unlock(&chlg_registered_dev_lock);
-	return obd;
-}
-
 /**
  * Open handler, initialize internal CRS state and spawn prefetch thread if
  * needed.
@@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
 static int chlg_open(struct inode *inode, struct file *file)
 {
 	struct chlg_reader_state *crs;
-	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
+	struct miscdevice *misc = file->private_data;
+	struct chlg_registered_dev *dev;
+	struct obd_device *obd;
 	struct task_struct *task;
 	int rc;
 
-	if (!obd)
-		return -ENODEV;
+	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
+	obd = list_first_entry(&dev->ced_obds,
+			       struct obd_device,
+			       u.cli.cl_chg_dev_linkage);
 
 	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
 	if (!crs)
-- 
2.14.0.rc0.dirty

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181031/c037145f/attachment.sig>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH] lustre: mdc: fix possible deadlock in chlg_open()
  2018-10-31  1:29 [lustre-devel] [PATCH] lustre: mdc: fix possible deadlock in chlg_open() NeilBrown
@ 2018-10-31 12:24 ` quentin.bouget at cea.fr
  2018-10-31 20:56   ` NeilBrown
  2018-10-31 23:01   ` [lustre-devel] [PATCH v2] " NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: quentin.bouget at cea.fr @ 2018-10-31 12:24 UTC (permalink / raw)
  To: lustre-devel

Le 31/10/2018 ? 02:29, NeilBrown a ?crit?:
> Lockdep reports a possible deadlock between chlg_open() and
> mdc_changelog_cdev_init()
>
> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
> calls misc_register() which takes misc_mtx.
> chlg_open() is called while misc_mtx is held, and tries to take
> chlg_registered_dev_lock.
> If these two functions race, a deadlock can occur as each thread will
> hold one of the locks while trying to take the other.
>
> chlg_open() does not need to take a lock.  It only uses the
> lock to stablize a list while looking for the matching
> chlg_registered_dev, and this can be found directly by examining
> file->private_data.
>
> So remove chlg_obd_get(), and use file->private_Data to find the
> obd_device.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>
> Note that this bug exists in upstream (OpenSFS) lustre as well.
I think you need to modify mdc_changelog_cdev_init() as well so that 
misc_register() comes after the two list_add_tail() statements.

If someone opens the device a bit a too fast after it becomes visible, 
they might find an empty list at `dev->ced_obds'.
Previously, holding `chlg_registered_dev_loc' implicitely prevented that.

Is it possible that maybe the kernel shouldn't hold `misc_mtx' while 
calling chlg_open()?

Quentin Bouget
>
>   drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 35 +++++------------------
>   1 file changed, 7 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> index d83507cbf95c..645e7f24a859 100644
> --- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> +++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> @@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
>   	return rc < 0 ? rc : count;
>   }
>   
> -/**
> - * Find the OBD device associated to a changelog character device.
> - * @param[in]  cdev  character device instance descriptor
> - * @return corresponding OBD device or NULL if none was found.
> - */
> -static struct obd_device *chlg_obd_get(dev_t cdev)
> -{
> -	int minor = MINOR(cdev);
> -	struct obd_device *obd = NULL;
> -	struct chlg_registered_dev *curr;
> -
> -	mutex_lock(&chlg_registered_dev_lock);
> -	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
> -		if (curr->ced_misc.minor == minor) {
> -			/* take the first available OBD device attached */
> -			obd = list_first_entry(&curr->ced_obds,
> -					       struct obd_device,
> -					       u.cli.cl_chg_dev_linkage);
> -			break;
> -		}
> -	}
> -	mutex_unlock(&chlg_registered_dev_lock);
> -	return obd;
> -}
> -
>   /**
>    * Open handler, initialize internal CRS state and spawn prefetch thread if
>    * needed.
> @@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
>   static int chlg_open(struct inode *inode, struct file *file)
>   {
>   	struct chlg_reader_state *crs;
> -	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
> +	struct miscdevice *misc = file->private_data;
> +	struct chlg_registered_dev *dev;
> +	struct obd_device *obd;
>   	struct task_struct *task;
>   	int rc;
>   
> -	if (!obd)
> -		return -ENODEV;
> +	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
> +	obd = list_first_entry(&dev->ced_obds,
> +			       struct obd_device,
> +			       u.cli.cl_chg_dev_linkage);
>   
>   	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
>   	if (!crs)
>
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181031/fb1187bf/attachment.html>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH] lustre: mdc: fix possible deadlock in chlg_open()
  2018-10-31 12:24 ` quentin.bouget at cea.fr
@ 2018-10-31 20:56   ` NeilBrown
  2018-10-31 23:01   ` [lustre-devel] [PATCH v2] " NeilBrown
  1 sibling, 0 replies; 12+ messages in thread
From: NeilBrown @ 2018-10-31 20:56 UTC (permalink / raw)
  To: lustre-devel

On Wed, Oct 31 2018, quentin.bouget at cea.fr wrote:

> Le 31/10/2018 ? 02:29, NeilBrown a ?crit?:
>> Lockdep reports a possible deadlock between chlg_open() and
>> mdc_changelog_cdev_init()
>>
>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>> calls misc_register() which takes misc_mtx.
>> chlg_open() is called while misc_mtx is held, and tries to take
>> chlg_registered_dev_lock.
>> If these two functions race, a deadlock can occur as each thread will
>> hold one of the locks while trying to take the other.
>>
>> chlg_open() does not need to take a lock.  It only uses the
>> lock to stablize a list while looking for the matching
>> chlg_registered_dev, and this can be found directly by examining
>> file->private_data.
>>
>> So remove chlg_obd_get(), and use file->private_Data to find the
>> obd_device.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>
>> Note that this bug exists in upstream (OpenSFS) lustre as well.
> I think you need to modify mdc_changelog_cdev_init() as well so that 
> misc_register() comes after the two list_add_tail() statements.

Yes, definitely.  Thanks for that.  I'll change the patch.

>
> If someone opens the device a bit a too fast after it becomes visible, 
> they might find an empty list at `dev->ced_obds'.
> Previously, holding `chlg_registered_dev_loc' implicitely prevented that.
>
> Is it possible that maybe the kernel shouldn't hold `misc_mtx' while 
> calling chlg_open()?

No, it really should.  It protects against races between chlg_open and
misc_unregister().

Thanks,
NeilBrown


>
> Quentin Bouget
>>
>>   drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 35 +++++------------------
>>   1 file changed, 7 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> index d83507cbf95c..645e7f24a859 100644
>> --- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> +++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> @@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
>>   	return rc < 0 ? rc : count;
>>   }
>>   
>> -/**
>> - * Find the OBD device associated to a changelog character device.
>> - * @param[in]  cdev  character device instance descriptor
>> - * @return corresponding OBD device or NULL if none was found.
>> - */
>> -static struct obd_device *chlg_obd_get(dev_t cdev)
>> -{
>> -	int minor = MINOR(cdev);
>> -	struct obd_device *obd = NULL;
>> -	struct chlg_registered_dev *curr;
>> -
>> -	mutex_lock(&chlg_registered_dev_lock);
>> -	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
>> -		if (curr->ced_misc.minor == minor) {
>> -			/* take the first available OBD device attached */
>> -			obd = list_first_entry(&curr->ced_obds,
>> -					       struct obd_device,
>> -					       u.cli.cl_chg_dev_linkage);
>> -			break;
>> -		}
>> -	}
>> -	mutex_unlock(&chlg_registered_dev_lock);
>> -	return obd;
>> -}
>> -
>>   /**
>>    * Open handler, initialize internal CRS state and spawn prefetch thread if
>>    * needed.
>> @@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
>>   static int chlg_open(struct inode *inode, struct file *file)
>>   {
>>   	struct chlg_reader_state *crs;
>> -	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
>> +	struct miscdevice *misc = file->private_data;
>> +	struct chlg_registered_dev *dev;
>> +	struct obd_device *obd;
>>   	struct task_struct *task;
>>   	int rc;
>>   
>> -	if (!obd)
>> -		return -ENODEV;
>> +	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
>> +	obd = list_first_entry(&dev->ced_obds,
>> +			       struct obd_device,
>> +			       u.cli.cl_chg_dev_linkage);
>>   
>>   	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
>>   	if (!crs)
>>
>>
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181101/ad861667/attachment.sig>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-10-31 12:24 ` quentin.bouget at cea.fr
  2018-10-31 20:56   ` NeilBrown
@ 2018-10-31 23:01   ` NeilBrown
  2018-11-04 21:34     ` James Simmons
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-10-31 23:01 UTC (permalink / raw)
  To: lustre-devel


Lockdep reports a possible deadlock between chlg_open() and
mdc_changelog_cdev_init()

mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
calls misc_register() which takes misc_mtx.
chlg_open() is called while misc_mtx is held, and tries to take
chlg_registered_dev_lock.
If these two functions race, a deadlock can occur as each thread will
hold one of the locks while trying to take the other.

chlg_open() does not need to take a lock.  It only uses the
lock to stablize a list while looking for the matching
chlg_registered_dev, and this can be found directly by examining
file->private_data.

So remove chlg_obd_get(), and use file->private_data to find the
obd_device.
Also ensure the device is fully initialized before calling
misc_register().  This means setting up some list linkage before the
call, and tearing it down if there is an error.

Signed-off-by: NeilBrown <neilb@suse.com>
---

This is the revised version with the problem identified by Quentin
fixed.

 drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 46 +++++++----------------
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
index d83507cbf95c..af29ea73c48a 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
@@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
 	return rc < 0 ? rc : count;
 }
 
-/**
- * Find the OBD device associated to a changelog character device.
- * @param[in]  cdev  character device instance descriptor
- * @return corresponding OBD device or NULL if none was found.
- */
-static struct obd_device *chlg_obd_get(dev_t cdev)
-{
-	int minor = MINOR(cdev);
-	struct obd_device *obd = NULL;
-	struct chlg_registered_dev *curr;
-
-	mutex_lock(&chlg_registered_dev_lock);
-	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
-		if (curr->ced_misc.minor == minor) {
-			/* take the first available OBD device attached */
-			obd = list_first_entry(&curr->ced_obds,
-					       struct obd_device,
-					       u.cli.cl_chg_dev_linkage);
-			break;
-		}
-	}
-	mutex_unlock(&chlg_registered_dev_lock);
-	return obd;
-}
-
 /**
  * Open handler, initialize internal CRS state and spawn prefetch thread if
  * needed.
@@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
 static int chlg_open(struct inode *inode, struct file *file)
 {
 	struct chlg_reader_state *crs;
-	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
+	struct miscdevice *misc = file->private_data;
+	struct chlg_registered_dev *dev;
+	struct obd_device *obd;
 	struct task_struct *task;
 	int rc;
 
-	if (!obd)
-		return -ENODEV;
+	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
+	obd = list_first_entry(&dev->ced_obds,
+			       struct obd_device,
+			       u.cli.cl_chg_dev_linkage);
 
 	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
 	if (!crs)
@@ -669,13 +648,16 @@ int mdc_changelog_cdev_init(struct obd_device *obd)
 		goto out_unlock;
 	}
 
+	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
+	list_add_tail(&entry->ced_link, &chlg_registered_devices);
+
 	/* Register new character device */
 	rc = misc_register(&entry->ced_misc);
-	if (rc != 0)
+	if (rc != 0) {
+		list_del_init(&obd->u.cli.cl_chg_dev_linkage);
+		list_del(&entry->ced_link);
 		goto out_unlock;
-
-	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
-	list_add_tail(&entry->ced_link, &chlg_registered_devices);
+	}
 
 	entry = NULL;	/* prevent it from being freed below */
 
-- 
2.14.0.rc0.dirty

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181101/1d8587cb/attachment.sig>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-10-31 23:01   ` [lustre-devel] [PATCH v2] " NeilBrown
@ 2018-11-04 21:34     ` James Simmons
  2018-11-05 13:37       ` quentin.bouget at cea.fr
  0 siblings, 1 reply; 12+ messages in thread
From: James Simmons @ 2018-11-04 21:34 UTC (permalink / raw)
  To: lustre-devel


> Lockdep reports a possible deadlock between chlg_open() and
> mdc_changelog_cdev_init()
> 
> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
> calls misc_register() which takes misc_mtx.
> chlg_open() is called while misc_mtx is held, and tries to take
> chlg_registered_dev_lock.
> If these two functions race, a deadlock can occur as each thread will
> hold one of the locks while trying to take the other.
> 
> chlg_open() does not need to take a lock.  It only uses the
> lock to stablize a list while looking for the matching
> chlg_registered_dev, and this can be found directly by examining
> file->private_data.
> 
> So remove chlg_obd_get(), and use file->private_data to find the
> obd_device.
> Also ensure the device is fully initialized before calling
> misc_register().  This means setting up some list linkage before the
> call, and tearing it down if there is an error.

I have been testing this but I'm no HSM expert. I pushed this patch
to OpenSFS branch as well.

https://jira.whamcloud.com/browse/LU-11617
https://review.whamcloud.com/#/c/33572/

Reviewed-by: James Simmons <jsimmons@infradead.org>

> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> This is the revised version with the problem identified by Quentin
> fixed.
> 
>  drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 46 +++++++----------------
>  1 file changed, 14 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> index d83507cbf95c..af29ea73c48a 100644
> --- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> +++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
> @@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
>  	return rc < 0 ? rc : count;
>  }
>  
> -/**
> - * Find the OBD device associated to a changelog character device.
> - * @param[in]  cdev  character device instance descriptor
> - * @return corresponding OBD device or NULL if none was found.
> - */
> -static struct obd_device *chlg_obd_get(dev_t cdev)
> -{
> -	int minor = MINOR(cdev);
> -	struct obd_device *obd = NULL;
> -	struct chlg_registered_dev *curr;
> -
> -	mutex_lock(&chlg_registered_dev_lock);
> -	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
> -		if (curr->ced_misc.minor == minor) {
> -			/* take the first available OBD device attached */
> -			obd = list_first_entry(&curr->ced_obds,
> -					       struct obd_device,
> -					       u.cli.cl_chg_dev_linkage);
> -			break;
> -		}
> -	}
> -	mutex_unlock(&chlg_registered_dev_lock);
> -	return obd;
> -}
> -
>  /**
>   * Open handler, initialize internal CRS state and spawn prefetch thread if
>   * needed.
> @@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
>  static int chlg_open(struct inode *inode, struct file *file)
>  {
>  	struct chlg_reader_state *crs;
> -	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
> +	struct miscdevice *misc = file->private_data;
> +	struct chlg_registered_dev *dev;
> +	struct obd_device *obd;
>  	struct task_struct *task;
>  	int rc;
>  
> -	if (!obd)
> -		return -ENODEV;
> +	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
> +	obd = list_first_entry(&dev->ced_obds,
> +			       struct obd_device,
> +			       u.cli.cl_chg_dev_linkage);
>  
>  	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
>  	if (!crs)
> @@ -669,13 +648,16 @@ int mdc_changelog_cdev_init(struct obd_device *obd)
>  		goto out_unlock;
>  	}
>  
> +	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
> +	list_add_tail(&entry->ced_link, &chlg_registered_devices);
> +
>  	/* Register new character device */
>  	rc = misc_register(&entry->ced_misc);
> -	if (rc != 0)
> +	if (rc != 0) {
> +		list_del_init(&obd->u.cli.cl_chg_dev_linkage);
> +		list_del(&entry->ced_link);
>  		goto out_unlock;
> -
> -	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
> -	list_add_tail(&entry->ced_link, &chlg_registered_devices);
> +	}
>  
>  	entry = NULL;	/* prevent it from being freed below */
>  
> -- 
> 2.14.0.rc0.dirty
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-11-04 21:34     ` James Simmons
@ 2018-11-05 13:37       ` quentin.bouget at cea.fr
  2018-11-06  3:11         ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: quentin.bouget at cea.fr @ 2018-11-05 13:37 UTC (permalink / raw)
  To: lustre-devel

Le 04/11/2018 ? 22:34, James Simmons a ?crit?:
>> Lockdep reports a possible deadlock between chlg_open() and
>> mdc_changelog_cdev_init()
>>
>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>> calls misc_register() which takes misc_mtx.
>> chlg_open() is called while misc_mtx is held, and tries to take
>> chlg_registered_dev_lock.
>> If these two functions race, a deadlock can occur as each thread will
>> hold one of the locks while trying to take the other.
>>
>> chlg_open() does not need to take a lock.  It only uses the
>> lock to stablize a list while looking for the matching
>> chlg_registered_dev, and this can be found directly by examining
>> file->private_data.
>>
>> So remove chlg_obd_get(), and use file->private_data to find the
>> obd_device.
>> Also ensure the device is fully initialized before calling
>> misc_register().  This means setting up some list linkage before the
>> call, and tearing it down if there is an error.
> I have been testing this but I'm no HSM expert. I pushed this patch
> to OpenSFS branch as well.
>
> https://jira.whamcloud.com/browse/LU-11617
> https://review.whamcloud.com/#/c/33572/
>
> Reviewed-by: James Simmons <jsimmons@infradead.org>

Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>

>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>
>> This is the revised version with the problem identified by Quentin
>> fixed.
>>
>>   drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 46 +++++++----------------
>>   1 file changed, 14 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> index d83507cbf95c..af29ea73c48a 100644
>> --- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> +++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
>> @@ -444,31 +444,6 @@ static ssize_t chlg_write(struct file *file, const char __user *buff,
>>   	return rc < 0 ? rc : count;
>>   }
>>   
>> -/**
>> - * Find the OBD device associated to a changelog character device.
>> - * @param[in]  cdev  character device instance descriptor
>> - * @return corresponding OBD device or NULL if none was found.
>> - */
>> -static struct obd_device *chlg_obd_get(dev_t cdev)
>> -{
>> -	int minor = MINOR(cdev);
>> -	struct obd_device *obd = NULL;
>> -	struct chlg_registered_dev *curr;
>> -
>> -	mutex_lock(&chlg_registered_dev_lock);
>> -	list_for_each_entry(curr, &chlg_registered_devices, ced_link) {
>> -		if (curr->ced_misc.minor == minor) {
>> -			/* take the first available OBD device attached */
>> -			obd = list_first_entry(&curr->ced_obds,
>> -					       struct obd_device,
>> -					       u.cli.cl_chg_dev_linkage);
>> -			break;
>> -		}
>> -	}
>> -	mutex_unlock(&chlg_registered_dev_lock);
>> -	return obd;
>> -}
>> -
>>   /**
>>    * Open handler, initialize internal CRS state and spawn prefetch thread if
>>    * needed.
>> @@ -479,12 +454,16 @@ static struct obd_device *chlg_obd_get(dev_t cdev)
>>   static int chlg_open(struct inode *inode, struct file *file)
>>   {
>>   	struct chlg_reader_state *crs;
>> -	struct obd_device *obd = chlg_obd_get(inode->i_rdev);
>> +	struct miscdevice *misc = file->private_data;
>> +	struct chlg_registered_dev *dev;
>> +	struct obd_device *obd;
>>   	struct task_struct *task;
>>   	int rc;
>>   
>> -	if (!obd)
>> -		return -ENODEV;
>> +	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
>> +	obd = list_first_entry(&dev->ced_obds,
>> +			       struct obd_device,
>> +			       u.cli.cl_chg_dev_linkage);
>>   
>>   	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
>>   	if (!crs)
>> @@ -669,13 +648,16 @@ int mdc_changelog_cdev_init(struct obd_device *obd)
>>   		goto out_unlock;
>>   	}
>>   
>> +	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
>> +	list_add_tail(&entry->ced_link, &chlg_registered_devices);
>> +
>>   	/* Register new character device */
>>   	rc = misc_register(&entry->ced_misc);
>> -	if (rc != 0)
>> +	if (rc != 0) {
>> +		list_del_init(&obd->u.cli.cl_chg_dev_linkage);
>> +		list_del(&entry->ced_link);
>>   		goto out_unlock;
>> -
>> -	list_add_tail(&obd->u.cli.cl_chg_dev_linkage, &entry->ced_obds);
>> -	list_add_tail(&entry->ced_link, &chlg_registered_devices);
>> +	}
>>   
>>   	entry = NULL;	/* prevent it from being freed below */
>>   
>> -- 
>> 2.14.0.rc0.dirty
>>
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-11-05 13:37       ` quentin.bouget at cea.fr
@ 2018-11-06  3:11         ` NeilBrown
  2018-11-06 10:41           ` quentin.bouget at cea.fr
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-11-06  3:11 UTC (permalink / raw)
  To: lustre-devel

On Mon, Nov 05 2018, quentin.bouget at cea.fr wrote:

> Le 04/11/2018 ? 22:34, James Simmons a ?crit?:
>>> Lockdep reports a possible deadlock between chlg_open() and
>>> mdc_changelog_cdev_init()
>>>
>>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>>> calls misc_register() which takes misc_mtx.
>>> chlg_open() is called while misc_mtx is held, and tries to take
>>> chlg_registered_dev_lock.
>>> If these two functions race, a deadlock can occur as each thread will
>>> hold one of the locks while trying to take the other.
>>>
>>> chlg_open() does not need to take a lock.  It only uses the
>>> lock to stablize a list while looking for the matching
>>> chlg_registered_dev, and this can be found directly by examining
>>> file->private_data.
>>>
>>> So remove chlg_obd_get(), and use file->private_data to find the
>>> obd_device.
>>> Also ensure the device is fully initialized before calling
>>> misc_register().  This means setting up some list linkage before the
>>> call, and tearing it down if there is an error.
>> I have been testing this but I'm no HSM expert. I pushed this patch
>> to OpenSFS branch as well.
>>
>> https://jira.whamcloud.com/browse/LU-11617
>> https://review.whamcloud.com/#/c/33572/
>>
>> Reviewed-by: James Simmons <jsimmons@infradead.org>
>
> Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
>

Thanks to you both for the review.

NeilBrown

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181106/3a987263/attachment.sig>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-11-06  3:11         ` NeilBrown
@ 2018-11-06 10:41           ` quentin.bouget at cea.fr
  2018-11-07  0:33             ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: quentin.bouget at cea.fr @ 2018-11-06 10:41 UTC (permalink / raw)
  To: lustre-devel

Le 06/11/2018 ? 04:11, NeilBrown a ?crit?:
> On Mon, Nov 05 2018, quentin.bouget at cea.fr wrote:
>
>> Le 04/11/2018 ? 22:34, James Simmons a ?crit?:
>>>> Lockdep reports a possible deadlock between chlg_open() and
>>>> mdc_changelog_cdev_init()
>>>>
>>>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>>>> calls misc_register() which takes misc_mtx.
>>>> chlg_open() is called while misc_mtx is held, and tries to take
>>>> chlg_registered_dev_lock.
>>>> If these two functions race, a deadlock can occur as each thread will
>>>> hold one of the locks while trying to take the other.
>>>>
>>>> chlg_open() does not need to take a lock.  It only uses the
>>>> lock to stablize a list while looking for the matching
>>>> chlg_registered_dev, and this can be found directly by examining
>>>> file->private_data.
>>>>
>>>> So remove chlg_obd_get(), and use file->private_data to find the
>>>> obd_device.
>>>> Also ensure the device is fully initialized before calling
>>>> misc_register().  This means setting up some list linkage before the
>>>> call, and tearing it down if there is an error.
>>> I have been testing this but I'm no HSM expert. I pushed this patch
>>> to OpenSFS branch as well.
>>>
>>> https://jira.whamcloud.com/browse/LU-11617
>>> https://review.whamcloud.com/#/c/33572/
>>>
>>> Reviewed-by: James Simmons <jsimmons@infradead.org>
>> Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
>>
> Thanks to you both for the review.
>
> NeilBrown
>
Wait! I just realised there might be another issue!
I think there is now a race between chlg_open() and 
mdc_changelog_cdev_finish().

Wait! I just realised there might be another bigger issue!
The whole "take the first obd you can find" is broken! I opened a ticket 
<https://jira.whamcloud.com/browse/LU-11626> on whamcloud's JIRA about it.

Quentin Bouget

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181106/bad9cdb2/attachment-0001.html>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-11-06 10:41           ` quentin.bouget at cea.fr
@ 2018-11-07  0:33             ` NeilBrown
  2018-11-07  0:43               ` NeilBrown
  2018-11-07  1:09               ` [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2018-11-07  0:33 UTC (permalink / raw)
  To: lustre-devel

On Tue, Nov 06 2018, quentin.bouget at cea.fr wrote:

> Le 06/11/2018 ? 04:11, NeilBrown a ?crit?:
>> On Mon, Nov 05 2018, quentin.bouget at cea.fr wrote:
>>
>>> Le 04/11/2018 ? 22:34, James Simmons a ?crit?:
>>>>> Lockdep reports a possible deadlock between chlg_open() and
>>>>> mdc_changelog_cdev_init()
>>>>>
>>>>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>>>>> calls misc_register() which takes misc_mtx.
>>>>> chlg_open() is called while misc_mtx is held, and tries to take
>>>>> chlg_registered_dev_lock.
>>>>> If these two functions race, a deadlock can occur as each thread will
>>>>> hold one of the locks while trying to take the other.
>>>>>
>>>>> chlg_open() does not need to take a lock.  It only uses the
>>>>> lock to stablize a list while looking for the matching
>>>>> chlg_registered_dev, and this can be found directly by examining
>>>>> file->private_data.
>>>>>
>>>>> So remove chlg_obd_get(), and use file->private_data to find the
>>>>> obd_device.
>>>>> Also ensure the device is fully initialized before calling
>>>>> misc_register().  This means setting up some list linkage before the
>>>>> call, and tearing it down if there is an error.
>>>> I have been testing this but I'm no HSM expert. I pushed this patch
>>>> to OpenSFS branch as well.
>>>>
>>>> https://jira.whamcloud.com/browse/LU-11617
>>>> https://review.whamcloud.com/#/c/33572/
>>>>
>>>> Reviewed-by: James Simmons <jsimmons@infradead.org>
>>> Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
>>>
>> Thanks to you both for the review.
>>
>> NeilBrown
>>
> Wait! I just realised there might be another issue!
> I think there is now a race between chlg_open() and 
> mdc_changelog_cdev_finish().

Hmmm.. yes.  Also another deadlock possibility as the locks can be taken
in the wrong order here too.

>
> Wait! I just realised there might be another bigger issue!
> The whole "take the first obd you can find" is broken! I opened a ticket 
> <https://jira.whamcloud.com/browse/LU-11626> on whamcloud's JIRA about it.

Yep...
My guess is that chlg_load(), chlg_clear() and (possibly)
chlg_read_cat_process_cb() should take the mutex, choose an obd, used
it, and release the mutex.

I might post an RFC patch.

NeilBrown


>
> Quentin Bouget
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181107/df58b631/attachment.sig>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH v2] lustre: mdc: fix possible deadlock in chlg_open()
  2018-11-07  0:33             ` NeilBrown
@ 2018-11-07  0:43               ` NeilBrown
  2018-11-07  1:09               ` [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access NeilBrown
  1 sibling, 0 replies; 12+ messages in thread
From: NeilBrown @ 2018-11-07  0:43 UTC (permalink / raw)
  To: lustre-devel

On Wed, Nov 07 2018, NeilBrown wrote:

> On Tue, Nov 06 2018, quentin.bouget at cea.fr wrote:
>
>> Le 06/11/2018 ? 04:11, NeilBrown a ?crit?:
>>> On Mon, Nov 05 2018, quentin.bouget at cea.fr wrote:
>>>
>>>> Le 04/11/2018 ? 22:34, James Simmons a ?crit?:
>>>>>> Lockdep reports a possible deadlock between chlg_open() and
>>>>>> mdc_changelog_cdev_init()
>>>>>>
>>>>>> mdc_changelog_cdev_init() takes chlg_registered_dev_lock and then
>>>>>> calls misc_register() which takes misc_mtx.
>>>>>> chlg_open() is called while misc_mtx is held, and tries to take
>>>>>> chlg_registered_dev_lock.
>>>>>> If these two functions race, a deadlock can occur as each thread will
>>>>>> hold one of the locks while trying to take the other.
>>>>>>
>>>>>> chlg_open() does not need to take a lock.  It only uses the
>>>>>> lock to stablize a list while looking for the matching
>>>>>> chlg_registered_dev, and this can be found directly by examining
>>>>>> file->private_data.
>>>>>>
>>>>>> So remove chlg_obd_get(), and use file->private_data to find the
>>>>>> obd_device.
>>>>>> Also ensure the device is fully initialized before calling
>>>>>> misc_register().  This means setting up some list linkage before the
>>>>>> call, and tearing it down if there is an error.
>>>>> I have been testing this but I'm no HSM expert. I pushed this patch
>>>>> to OpenSFS branch as well.
>>>>>
>>>>> https://jira.whamcloud.com/browse/LU-11617
>>>>> https://review.whamcloud.com/#/c/33572/
>>>>>
>>>>> Reviewed-by: James Simmons <jsimmons@infradead.org>
>>>> Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
>>>>
>>> Thanks to you both for the review.
>>>
>>> NeilBrown
>>>
>> Wait! I just realised there might be another issue!
>> I think there is now a race between chlg_open() and 
>> mdc_changelog_cdev_finish().
>
> Hmmm.. yes.  Also another deadlock possibility as the locks can be taken
> in the wrong order here too.

Sorry, I got that wrong - there is no other deadlock.
mdc_changelog_cdev_finish() can call misc_deregister() while holding
chlg_registered_dev_lock, and misc_deregister() will take misc_mtx, but
that is the right way around, not deadlock-causing.

NeilBrown

>
>>
>> Wait! I just realised there might be another bigger issue!
>> The whole "take the first obd you can find" is broken! I opened a ticket 
>> <https://jira.whamcloud.com/browse/LU-11626> on whamcloud's JIRA about it.
>
> Yep...
> My guess is that chlg_load(), chlg_clear() and (possibly)
> chlg_read_cat_process_cb() should take the mutex, choose an obd, used
> it, and release the mutex.
>
> I might post an RFC patch.
>
> NeilBrown
>
>
>>
>> Quentin Bouget
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181107/fe44f28c/attachment.sig>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access.
  2018-11-07  0:33             ` NeilBrown
  2018-11-07  0:43               ` NeilBrown
@ 2018-11-07  1:09               ` NeilBrown
  2018-11-07 10:05                 ` quentin.bouget at cea.fr
  1 sibling, 1 reply; 12+ messages in thread
From: NeilBrown @ 2018-11-07  1:09 UTC (permalink / raw)
  To: lustre-devel


A changelog cdev can be held open indefinitely, and obd structures can
come and go while it is open.  So it isn't safe to choose and obd
at open time, and continue to use it.

Instead, we need to choose and obd on each access that needs it, and
hold the chlg_registered_dev_lock mutex while using the obd.  This
prevents the obd from being freed.

To help with this, we store a link to the chlg_registered_dev
in the chlg_reader_state, and use the name of the dev (instead
of the name of the obd) in some error messages.

Reported-by: Quentin Bouget <quentin.bouget@cea.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
---

This is and RFC - I have only compile-tested it.
It seems sensible to me, but I know little about what is happening
here so I could easily be missing something important.

See also https://jira.whamcloud.com/browse/LU-11626

Thanks,
NeilBrown



 drivers/staging/lustre/lustre/mdc/mdc_changelog.c | 50 ++++++++++++++++-------
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
index becdee86f4d2..ee4b1b95408d 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_changelog.c
@@ -66,8 +66,8 @@ struct chlg_registered_dev {
 };
 
 struct chlg_reader_state {
-	/* Shortcut to the corresponding OBD device */
-	struct obd_device	*crs_obd;
+	/* Device this state is associated with */
+	struct chlg_registered_dev *crs_dev;
 	/* Producer thread (if any) */
 	struct task_struct	*crs_prod_task;
 	/* An error occurred that prevents from reading further */
@@ -132,7 +132,7 @@ static int chlg_read_cat_process_cb(const struct lu_env *env,
 	if (rec->cr_hdr.lrh_type != CHANGELOG_REC) {
 		rc = -EINVAL;
 		CERROR("%s: not a changelog rec %x/%d in llog : rc = %d\n",
-		       crs->crs_obd->obd_name, rec->cr_hdr.lrh_type,
+		       crs->crs_dev->ced_name, rec->cr_hdr.lrh_type,
 		       rec->cr.cr_type, rc);
 		return rc;
 	}
@@ -183,6 +183,17 @@ static void enq_record_delete(struct chlg_rec_entry *rec)
 	kfree(rec);
 }
 
+/*
+ * Find any OBD device associated with this reader
+ * chlg_registered_dev_lock is held.
+ */
+static inline struct obd_device *chlg_obd_get(struct chlg_registered_dev *dev)
+{
+	return list_first_entry_or_null(&dev->ced_obds,
+					struct obd_device,
+					u.cli.cl_chg_dev_linkage);
+}
+
 /**
  * Record prefetch thread entry point. Opens the changelog catalog and starts
  * reading records.
@@ -193,11 +204,17 @@ static void enq_record_delete(struct chlg_rec_entry *rec)
 static int chlg_load(void *args)
 {
 	struct chlg_reader_state *crs = args;
-	struct obd_device *obd = crs->crs_obd;
+	struct obd_device *obd;
 	struct llog_ctxt *ctx = NULL;
 	struct llog_handle *llh = NULL;
 	int rc;
 
+	mutex_lock(&chlg_registered_dev_lock);
+	obd = chlg_obd_get(crs->crs_dev);
+	if (!obd) {
+		rc = -ENOENT;
+		goto err_out;
+	}
 	ctx = llog_get_context(obd, LLOG_CHANGELOG_REPL_CTXT);
 	if (!ctx) {
 		rc = -ENOENT;
@@ -230,6 +247,7 @@ static int chlg_load(void *args)
 	crs->crs_eof = true;
 
 err_out:
+	mutex_unlock(&chlg_registered_dev_lock);
 	if (rc < 0)
 		crs->crs_err = true;
 
@@ -387,15 +405,23 @@ static loff_t chlg_llseek(struct file *file, loff_t off, int whence)
  */
 static int chlg_clear(struct chlg_reader_state *crs, u32 reader, u64 record)
 {
-	struct obd_device *obd = crs->crs_obd;
+	struct obd_device *obd;
 	struct changelog_setinfo cs  = {
 		.cs_recno = record,
 		.cs_id    = reader
 	};
+	int ret;
 
-	return obd_set_info_async(NULL, obd->obd_self_export,
-				  strlen(KEY_CHANGELOG_CLEAR),
-				  KEY_CHANGELOG_CLEAR, sizeof(cs), &cs, NULL);
+	mutex_lock(&chlg_registered_dev_lock);
+	obd = chlg_obd_get(crs->crs_dev);
+	if (!obd)
+		ret = -ENOENT;
+	else
+		ret = obd_set_info_async(NULL, obd->obd_self_export,
+					 strlen(KEY_CHANGELOG_CLEAR),
+					 KEY_CHANGELOG_CLEAR, sizeof(cs), &cs, NULL);
+	mutex_unlock(&chlg_registered_dev_lock);
+	return ret;
 }
 
 /** Maximum changelog control command size */
@@ -456,20 +482,16 @@ static int chlg_open(struct inode *inode, struct file *file)
 	struct chlg_reader_state *crs;
 	struct miscdevice *misc = file->private_data;
 	struct chlg_registered_dev *dev;
-	struct obd_device *obd;
 	struct task_struct *task;
 	int rc;
 
 	dev = container_of(misc, struct chlg_registered_dev, ced_misc);
-	obd = list_first_entry(&dev->ced_obds,
-			       struct obd_device,
-			       u.cli.cl_chg_dev_linkage);
 
 	crs = kzalloc(sizeof(*crs), GFP_KERNEL);
 	if (!crs)
 		return -ENOMEM;
 
-	crs->crs_obd = obd;
+	crs->crs_dev = dev;
 	crs->crs_err = false;
 	crs->crs_eof = false;
 
@@ -483,7 +505,7 @@ static int chlg_open(struct inode *inode, struct file *file)
 		if (IS_ERR(task)) {
 			rc = PTR_ERR(task);
 			CERROR("%s: cannot start changelog thread: rc = %d\n",
-			       obd->obd_name, rc);
+			       dev->ced_name, rc);
 			goto err_crs;
 		}
 		crs->crs_prod_task = task;
-- 
2.14.0.rc0.dirty

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20181107/c50ffabf/attachment.sig>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access.
  2018-11-07  1:09               ` [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access NeilBrown
@ 2018-11-07 10:05                 ` quentin.bouget at cea.fr
  0 siblings, 0 replies; 12+ messages in thread
From: quentin.bouget at cea.fr @ 2018-11-07 10:05 UTC (permalink / raw)
  To: lustre-devel

Le 07/11/2018 ? 02:09, NeilBrown a ?crit?:
> A changelog cdev can be held open indefinitely, and obd structures can
> come and go while it is open.  So it isn't safe to choose and obd
> at open time, and continue to use it.
>
> Instead, we need to choose and obd on each access that needs it, and
> hold the chlg_registered_dev_lock mutex while using the obd.  This
> prevents the obd from being freed.
`struct llog_ctxt' has a reference on that obd (and `struct llog_handle' 
has a reference on a `struct llog_ctxt').
I don't think letting the obd go away is safe.

> @@ -193,11 +204,17 @@ static void enq_record_delete(struct chlg_rec_entry *rec)
>   static int chlg_load(void *args)
>   {
>   	struct chlg_reader_state *crs = args;
> -	struct obd_device *obd = crs->crs_obd;
> +	struct obd_device *obd;
>   	struct llog_ctxt *ctx = NULL;
>   	struct llog_handle *llh = NULL;
>   	int rc;
>   
> +	mutex_lock(&chlg_registered_dev_lock);
> +	obd = chlg_obd_get(crs->crs_dev);
> +	if (!obd) {
> +		rc = -ENOENT;
> +		goto err_out;
> +	}
>   	ctx = llog_get_context(obd, LLOG_CHANGELOG_REPL_CTXT);
 From there the obd is irrevocably referenced. I suspect it is used for 
more than just its name.
Maybe trying to unmount the (wrong) mountpoint will fail after that, but 
I am not confident.

Quentin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-07 10:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31  1:29 [lustre-devel] [PATCH] lustre: mdc: fix possible deadlock in chlg_open() NeilBrown
2018-10-31 12:24 ` quentin.bouget at cea.fr
2018-10-31 20:56   ` NeilBrown
2018-10-31 23:01   ` [lustre-devel] [PATCH v2] " NeilBrown
2018-11-04 21:34     ` James Simmons
2018-11-05 13:37       ` quentin.bouget at cea.fr
2018-11-06  3:11         ` NeilBrown
2018-11-06 10:41           ` quentin.bouget at cea.fr
2018-11-07  0:33             ` NeilBrown
2018-11-07  0:43               ` NeilBrown
2018-11-07  1:09               ` [lustre-devel] [PATCH/RFC] lustre: changelog_cdev need to find obd on each access NeilBrown
2018-11-07 10:05                 ` quentin.bouget at cea.fr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.