From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966107AbbD2GtJ (ORCPT ); Wed, 29 Apr 2015 02:49:09 -0400 Received: from mail-pa0-f49.google.com ([209.85.220.49]:36525 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753357AbbD2GtH (ORCPT ); Wed, 29 Apr 2015 02:49:07 -0400 Date: Wed, 29 Apr 2015 15:48:58 +0900 From: Minchan Kim To: Sergey Senozhatsky Cc: Andrew Morton , Nitin Gupta , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality Message-ID: <20150429064858.GA5125@blaptop> References: <1430140911-7818-1-git-send-email-sergey.senozhatsky@gmail.com> <1430140911-7818-10-git-send-email-sergey.senozhatsky@gmail.com> <20150429001624.GA3917@swordfish> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150429001624.GA3917@swordfish> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Sergey, On Wed, Apr 29, 2015 at 09:16:24AM +0900, Sergey Senozhatsky wrote: > Hello, > > Minchan, a quick question. just to avoid resends and to make sure that I'm > not missing any better solution. > > > lockdep is unhappy here: > > > -static void zram_remove(struct zram *zram) > > +static int zram_remove(struct zram *zram) > > { > > - pr_info("Removed device: %s\n", zram->disk->disk_name); > > + struct block_device *bdev; > > + > > + bdev = bdget_disk(zram->disk, 0); > > + if (!bdev) > > + return -ENOMEM; > > + > > + mutex_lock(&bdev->bd_mutex); > > + if (bdev->bd_openers) { > > + mutex_unlock(&bdev->bd_mutex); > > + return -EBUSY; > > + } > > + > > /* > > * Remove sysfs first, so no one will perform a disksize > > - * store while we destroy the devices > > + * store while we destroy the devices. This also helps during > > + * zram_remove() -- device_reset() is the last holder of > > + * ->init_lock. > > */ > > sysfs_remove_group(&disk_to_dev(zram->disk)->kobj, > > &zram_disk_attr_group); > > > > + /* Make sure all pending I/O is finished */ > > + fsync_bdev(bdev); > > zram_reset_device(zram); > > + mutex_unlock(&bdev->bd_mutex); > > + > > + pr_info("Removed device: %s\n", zram->disk->disk_name); > > + > > idr_remove(&zram_index_idr, zram->disk->first_minor); > > blk_cleanup_queue(zram->disk->queue); > > del_gendisk(zram->disk); > > put_disk(zram->disk); > > kfree(zram); > > + > > + return 0; > > } > > > lock ->bd_mutex > if ->bd_openers { > unlock ->bd_mutex > return -EBUSY; > } > > sysfs_remove_group() > ^^^^^^^^^ lockdep splat > zram_reset_device() > lock ->init_lock > reset device > unlock ->init_lock > unlock ->bd_mutex > ... > kfree zram > > > why did I do this: > > sysfs_remove_group() turns zram_reset_device() into the last possible ->init_lock > owner: there are no sysfs nodes before final zram_reset_device(), so no > concurrent nor later store()/show() sysfs handler can be called. it closes a number > of race conditions, like: > > CPU0 CPU1 > umount > zram_remove() > zram_reset_device() disksize_store() > mount > kfree zram > > or > > CPU0 CPU1 > umount > zram_remove() > zram_reset_device() > cat /sys/block/zram0/_any_sysfs_node_ > sysfs_remove_group() > kfree zram _any_sysfs_node_read() > > > and so on. so removing sysfs group before zram_reset_device() makes sense. > > at the same time we need to prevent `umount-zram_remove vs. mount' race and forbid > zram_remove() on active device. so we check ->bd_openers and perform device reset > under ->bd_mutex. > Could you explain in detail about unmount-zram_remove vs. mount race? I guess it should be done in upper layer(e,g. VFS). Anyway, I want to be more clear about that. > > a quick solution I can think of is to do something like this: > > sysfs_remove_group(); > lock ->bd_mutex > if ->bd_openers { > unlock ->bd_mutex > create_sysfs_group() > ^^^^^^^^ return attrs back > return -EBUSY > } > > zram_reset_device() > lock ->init_lock > reset device > unlock ->init_lock > unlock ->bd_mutex > ... > kfree zram > > > iow, move sysfs_remove_group() out of ->bd_mutex lock, but keep it > before ->bd_openers check & zram_reset_device(). > > I don't think that we can handle it with only ->init_lock. we need to unlock > it at some point and kfree zram that contains that lock. and we have no idea > are there any lock owners or waiters when we kfree zram. removing sysfs group > and acquiring ->init_lock for write in zram_reset_device() guarantee that > there will no further ->init_lock owners and we can safely kfree after > `unlock ->init_lock`. hence, sysfs_remove_group() must happen before > zram_reset_device(). > > > create_sysfs_group() potentially can fail. which is a bit ugly. but user > will be able to umount device and remove it anyway. > > > what do you think? > > -ss -- Kind regards, Minchan Kim