All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: gregkh@linuxfoundation.org, ngupta@vflare.org,
	sergey.senozhatsky.work@gmail.com, axboe@kernel.dk,
	mbenes@suse.com, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate
Date: Mon, 22 Mar 2021 15:12:01 -0700	[thread overview]
Message-ID: <YFkWMZ0m9nKCT69T@google.com> (raw)
In-Reply-To: <20210322204156.GM4332@42.do-not-panic.com>

On Mon, Mar 22, 2021 at 08:41:56PM +0000, Luis Chamberlain wrote:
> On Mon, Mar 22, 2021 at 09:37:17AM -0700, Minchan Kim wrote:
> > On Fri, Mar 19, 2021 at 07:09:24PM +0000, Luis Chamberlain wrote:
> > > Indeed one issue is a consequence of the other but a bit better
> > > description can be put together for both separately.
> > > 
> > > The warning comes up when cpu hotplug detects that the callback
> > > is being removed, but yet "instances" for multistate are left behind,
> > > ie, not removed. CPU hotplug multistate allows us to dedicate a callback
> > > for zram so that it gets called every time a CPU hot plugs or unplugs.
> > > In the zram driver's case we want to trigger the callback per each
> > > struct zcomp, one is used per supported and used supported compression
> > 
> >      you meant "each one is used per supported compression"?
> 
> Yup.
> 
> > > algorithm. The zram driver allocates a struct zcomp for an compression
> > > algorithm on a need basis. The default compressor is CONFIG_ZRAM_DEF_COMP
> > > which today is CONFIG_ZRAM_DEF_COMP_LZORLE ("lzo-rle"). Although we may
> > > have different zram devices, zram devices which use the same compression
> > > algorithm share the same struct zcomp. The "multi" in CPU hotplug multstate
> > 
> > It allocates each own zcomp, not sharing even though zram devices
> > use the same compression algorithm.
> 
> Right.
> 
> > > refers to these different struct zcomp instances, each of these will
> > > have the callback routine registered run for it. The kernel's CPU
> > > hotplug multistate keeps a linked list of these different structures
> > > so that it will iterate over them on CPU transitions. By default at
> > > driver initialization we will create just one zram device (num_devices=1)
> > > and a zcomp structure then set for the lzo-rle comrpession algorithm.
> > > At driver removal we first remove each zram device, and so we destroy
> > > the struct zcomp. But since we expose sysfs attributes to create new
> > > devices or reset / initialize existing ones, we can easily end up
> > > re-initializing a struct zcomp before the exit routine of the module
> > > removes the cpu hotplug callback. When this happens the kernel's
> > > CPU hotplug will detect that at least one instance (struct zcomp for
> > > us) exists.
> > 
> > It's very helpful to understand. Thanks a lot!S
> > 
> > So IIUC, it's fundamentally race between destroy_devices(i.e., module
> > unload) and every sysfs knobs in zram. Isn't it?
> 
> I would not call it *every* syfs knob as not all deal with things which
> are related to CPU hotplug multistate, right? Note that using just
> try_module_get() alone (that is the second patch only, does not fix the
> race I am describing above).

It wouldn't be CPU hotplug multistate issue but I'd like to call it
as more "zram instance race" bug.
What happens in this case?

        CPU 1                            CPU 2

destroy_devices
..
                                 compact_store()
                                 zram = dev_to_zram(dev);
idr_for_each(zram_remove_cb
  zram_remove
  ..
  kfree(zram)
                                 down_read(&zram->init_lock);


        CPU 1                            CPU 2

hot_remove_store
                                 compact_store()
                                 zram = dev_to_zram(dev);
  zram_remove
    kfree(zram)
                                 down_read(&zram->init_lock);
    				
So, for me we need to close the zram instance create/removal
with sysfs rather than focusing on CPU hotplug issue.
Maybe, we could reuse zram_index_mutex with modifying it with
rw_semaphore. What do you think?

> 
> There are two issues.
> 
> > Below specific example is one of them and every sysfs code in zram
> > could access freed object(e.g., zram->something).
> > And you are claiming there isn't good way to fix it in kernfs(please
> > add it in the description, too) even though it's general problem.
> 
> Correct, the easier route would have been through the block layer as
> its the one adding our syfs attributes for us but even it canno deal
> with this race on behalf of drivers given the currently exposed
> semantics on kernfs.
> 
> > (If we had, we may detect the race and make zram_remove_cb fails
> > so unloading modulies fails, finally)
> > 
> > So, shouldn't we introcude a global rw_semaphore?
> > 
> > destroy_devices
> >   class_unregister
> >   down_write(&zram_unload);
> >   idr_for_each(zram_remove_cb);
> >   up_write(&zram_unload);
> > 
> > And every sysfs code hold the lock with down_read in the first place
> > and release the lock right before return. So, introduce a zram sysfs
> > wrapper to centralize all of locking logic.
> 
> Actually that does work but only if we use write lock attempts so to
> be able to knock two birds with one stone, so to address the deadlock
> with sysfs attribute removal. We're not asking politely to read some
> value off of a zram devices with these locks, we're ensuring to not
> run if our driver is going to be removed, so replacing the
> try_module_get() with something similar.
> 
> > Does it make sense?
> > 
> > > 
> > > And so we can have:
> > > 
> > > static void destroy_devices(void)
> > > {
> > > 	class_unregister(&zram_control_class);
> > > 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
> > > 	zram_debugfs_destroy();
> > > 	idr_destroy(&zram_index_idr);
> > > 	unregister_blkdev(zram_major, "zram");
> > > 	cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
> > > }
> > > 
> > > While destroy_devices() runs we can have this race:
> > > 
> > > 
> > > CPU 1                            CPU 2
> > > 
> > > class_unregister(...);
> > > idr_for_each(...);
> > 				  I think sysfs was already remove here.
> > > zram_debugfs_destroy();
> > >                                 disksize_store(...)
> > > idr_destroy(...);
> > > unregister_blkdev(...);
> > > cpuhp_remove_multi_state(...);
> > 
> > So,
> > 
> > 	CPU 1                            CPU 2
> >  
> > destroy_devices
> > ..
> > 				 disksize_store()
> > 				   zcomp_create
> > 				   zcomp_init
> > idr_for_each(zram_remove_cb
> >   zram_remove
> >   zram_reset
> >   zcomp_destroy
> >   cpuhp_state_remove_instance
> > 				   cpuhp_state_add_instance
> > ..
> > 
> > cpuhp_remove_multi_state(...)
> >   Detect!
> >     kernel: Error: Removing state 63 which has instances left.
> 
> Yup true.
> 
> > > The warning comes up on cpuhp_remove_multi_state() when it sees
> > > that the state for CPUHP_ZCOMP_PREPARE does not have an empty
> > > instance linked list.
> > > 
> > > After the idr_destory() its also easy to run into a crash. The
> > > above predicament also leads to memory leaks.
> > > 
> > > And so we need to have a state machine for when we're up and when
> > > we're going down generically.
> > > 
> > > Let me know if it makes sense now, if so I can amend the commit log
> > > accordingly.
> > 
> > Yub, It's much better. Let's have it in the commit log.
> 
> Sure OK, well the following is what I end up with. With this approach
> only one patch is needed.
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a711a2e2a794..3b86ee94309e 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -41,6 +41,15 @@ static DEFINE_IDR(zram_index_idr);
>  /* idr index must be protected */
>  static DEFINE_MUTEX(zram_index_mutex);
>  
> +/*
> + * Protects against:
> + * a) Hotplug cpu multistate races against compression algorithm removals
> + *    and additions prior to its removal of the zram CPU hotplug callback
> + * b) Deadlock which is possible when sysfs attributes are used and we
> + *    share a lock on removal.
> + */
> +DECLARE_RWSEM(zram_unload);
> +
>  static int zram_major;
>  static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
>  
> @@ -1723,6 +1732,9 @@ static ssize_t disksize_store(struct device *dev,
>  	if (!disksize)
>  		return -EINVAL;
>  
> +	if (!down_write_trylock(&zram_unload))
> +		return -ENODEV;
> +
>  	down_write(&zram->init_lock);
>  	if (init_done(zram)) {
>  		pr_info("Cannot change disksize for initialized device\n");
> @@ -1748,6 +1760,7 @@ static ssize_t disksize_store(struct device *dev,
>  	zram->disksize = disksize;
>  	set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
>  	up_write(&zram->init_lock);
> +	up_write(&zram_unload);
>  
>  	return len;
>  
> @@ -1755,6 +1768,7 @@ static ssize_t disksize_store(struct device *dev,
>  	zram_meta_free(zram, disksize);
>  out_unlock:
>  	up_write(&zram->init_lock);
> +	up_write(&zram_unload);
>  	return err;
>  }
>  
> @@ -1773,14 +1787,17 @@ static ssize_t reset_store(struct device *dev,
>  	if (!do_reset)
>  		return -EINVAL;
>  
> +	if (!down_write_trylock(&zram_unload))
> +		return -ENODEV;
> +
>  	zram = dev_to_zram(dev);
>  	bdev = zram->disk->part0;
>  
>  	mutex_lock(&bdev->bd_mutex);
>  	/* Do not reset an active device or claimed device */
>  	if (bdev->bd_openers || zram->claim) {
> -		mutex_unlock(&bdev->bd_mutex);
> -		return -EBUSY;
> +		len = -EBUSY;
> +		goto out;
>  	}
>  
>  	/* From now on, anyone can't open /dev/zram[0-9] */
> @@ -1793,7 +1810,10 @@ static ssize_t reset_store(struct device *dev,
>  
>  	mutex_lock(&bdev->bd_mutex);
>  	zram->claim = false;
> +
> +out:
>  	mutex_unlock(&bdev->bd_mutex);
> +	up_write(&zram_unload);
>  
>  	return len;
>  }
> @@ -2015,10 +2035,15 @@ static ssize_t hot_add_show(struct class *class,
>  {
>  	int ret;
>  
> +	if (!down_write_trylock(&zram_unload))
> +		return -ENODEV;
> +
>  	mutex_lock(&zram_index_mutex);
>  	ret = zram_add();
>  	mutex_unlock(&zram_index_mutex);
>  
> +	up_write(&zram_unload);
> +
>  	if (ret < 0)
>  		return ret;
>  	return scnprintf(buf, PAGE_SIZE, "%d\n", ret);
> @@ -2041,6 +2066,9 @@ static ssize_t hot_remove_store(struct class *class,
>  	if (dev_id < 0)
>  		return -EINVAL;
>  
> +	if (!down_write_trylock(&zram_unload))
> +		return -ENODEV;
> +
>  	mutex_lock(&zram_index_mutex);
>  
>  	zram = idr_find(&zram_index_idr, dev_id);
> @@ -2053,6 +2081,7 @@ static ssize_t hot_remove_store(struct class *class,
>  	}
>  
>  	mutex_unlock(&zram_index_mutex);
> +	up_write(&zram_unload);
>  	return ret ? ret : count;
>  }
>  static CLASS_ATTR_WO(hot_remove);
> @@ -2078,12 +2107,14 @@ static int zram_remove_cb(int id, void *ptr, void *data)
>  
>  static void destroy_devices(void)
>  {
> +	down_write(&zram_unload);
>  	class_unregister(&zram_control_class);
>  	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
>  	zram_debugfs_destroy();
>  	idr_destroy(&zram_index_idr);
>  	unregister_blkdev(zram_major, "zram");
>  	cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
> +	up_write(&zram_unload);
>  }
>  
>  static int __init zram_init(void)
> @@ -2095,10 +2126,13 @@ static int __init zram_init(void)
>  	if (ret < 0)
>  		return ret;
>  
> +	down_write(&zram_unload);
> +
>  	ret = class_register(&zram_control_class);
>  	if (ret) {
>  		pr_err("Unable to register zram-control class\n");
>  		cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
> +		up_write(&zram_unload);
>  		return ret;
>  	}
>  
> @@ -2108,6 +2142,7 @@ static int __init zram_init(void)
>  		pr_err("Unable to get major number\n");
>  		class_unregister(&zram_control_class);
>  		cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
> +		up_write(&zram_unload);
>  		return -EBUSY;
>  	}
>  
> @@ -2119,10 +2154,12 @@ static int __init zram_init(void)
>  			goto out_error;
>  		num_devices--;
>  	}
> +	up_write(&zram_unload);
>  
>  	return 0;
>  
>  out_error:
> +	up_write(&zram_unload);
>  	destroy_devices();
>  	return ret;
>  }

As I mentioned above, it didn't close the race I gave as examples
unless I miss something. Let's discuss further.

Thank you!

  reply	other threads:[~2021-03-22 22:12 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-06  2:20 [PATCH 0/2] zram: fix few ltp zram02.sh crashes Luis Chamberlain
2021-03-06  2:20 ` [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate Luis Chamberlain
2021-03-09  2:55   ` Minchan Kim
2021-03-10 13:11     ` Luis Chamberlain
2021-03-10 21:25       ` Luis Chamberlain
2021-03-12  2:08       ` Minchan Kim
2021-03-10 21:21     ` Luis Chamberlain
2021-03-12  2:14       ` Minchan Kim
2021-03-12 18:32         ` Luis Chamberlain
2021-03-12 19:28           ` Minchan Kim
2021-03-19 19:09             ` Luis Chamberlain
2021-03-22 16:37               ` Minchan Kim
2021-03-22 20:41                 ` Luis Chamberlain
2021-03-22 22:12                   ` Minchan Kim [this message]
2021-04-01 23:59                     ` Luis Chamberlain
2021-04-02  7:54                       ` Greg KH
2021-04-02 18:30                         ` Luis Chamberlain
2021-04-03  6:13                           ` Greg KH
     [not found]                             ` <20210406003152.GZ4332@42.do-not-panic.com>
2021-04-06 12:00                               ` Miroslav Benes
2021-04-06 15:54                                 ` Josh Poimboeuf
2021-04-07 14:09                                   ` Peter Zijlstra
2021-04-07 15:30                                     ` Josh Poimboeuf
2021-04-07 16:48                                       ` Peter Zijlstra
2021-04-07 20:17                         ` Josh Poimboeuf
2021-04-08  6:18                           ` Greg KH
2021-04-08 13:16                             ` Steven Rostedt
2021-04-08 13:37                             ` Josh Poimboeuf
2021-04-08  1:37                         ` Thomas Gleixner
2021-04-08  6:16                           ` Greg KH
2021-04-08  8:01                             ` Jiri Kosina
2021-04-08  8:09                               ` Greg KH
2021-04-08  8:35                                 ` Jiri Kosina
2021-04-08  8:55                                   ` Greg KH
2021-04-08 18:40                                     ` Luis Chamberlain
2021-04-09  3:01                                     ` Kees Cook
2021-04-05 17:07                       ` Minchan Kim
2021-04-05 19:00                         ` Luis Chamberlain
2021-04-05 19:58                           ` Minchan Kim
2021-04-06  0:29                             ` Luis Chamberlain
2021-04-07  1:23                               ` Minchan Kim
2021-04-07  1:38                                 ` Minchan Kim
2021-04-07 14:52                                   ` Luis Chamberlain
2021-04-07 14:50                                 ` Luis Chamberlain
2021-03-06  2:20 ` [PATCH 2/2] zram: fix races of sysfs attribute removal and usage Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFkWMZ0m9nKCT69T@google.com \
    --to=minchan@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbenes@suse.com \
    --cc=mcgrof@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.