linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bob Liu <bob.liu@oracle.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>,
	"dm-devel@redhat.com" <dm-devel@redhat.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>,
	Dmitry Fomichev <Dmitry.Fomichev@wdc.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"shirley.ma@oracle.com" <shirley.ma@oracle.com>
Subject: Re: [RFC PATCH] dm-zoned: extend the way of exposing zoned block device
Date: Wed, 8 Jan 2020 21:46:26 +0800	[thread overview]
Message-ID: <9e7d2f84-6efa-7c44-2313-860d5e8ed067@oracle.com> (raw)
In-Reply-To: <BYAPR04MB5816BA749332D2FC6CE3993AE73E0@BYAPR04MB5816.namprd04.prod.outlook.com>

On 1/8/20 3:40 PM, Damien Le Moal wrote:
> On 2020/01/08 16:13, Nobody wrote:
>> From: Bob Liu <bob.liu@oracle.com>
>>
>> Motivation:
>> Now the dm-zoned device mapper target exposes a zoned block device(ZBC) as a
>> regular block device by storing metadata and buffering random writes in
>> conventional zones.
>> This way is not very flexible, there must be enough conventional zones and the
>> performance may be constrained.
>> By putting metadata(also buffering random writes) in separated device we can get
>> more flexibility and potential performance improvement e.g by storing metadata
>> in faster device like persistent memory.
>>
>> This patch try to split the metadata of dm-zoned to an extra block
>> device instead of zoned block device itself.
>> (Buffering random writes also in the todo list.)
>>
>> Patch is at the very early stage, just want to receive some feedback about
>> this extension.
>> Another option is to create an new md-zoned device with separated metadata
>> device based on md framework.
> 
> For metadata only, it should not be hard at all to move to another
> conventional zone device. It will however be a little more tricky for
> conventional zones used for data since dm-zoned assumes that this random
> write space is also zoned. Moving this space to a conventional device
> requires implementing a zone emulation (fake zones) for the regular
> drive, using a zone size that matches the size of sequential zones.
> 

Indeed.
I'll try to implement zone emulation next version.

> Beyond this, dm-zoned also needs to be changed to accept partial drives
> and the dm core code to accept mixing of regular and zoned disks (that
> is forbidden now).
> 

Do you mean the check in device_area_is_invalid()? 

 320         /*
 321          * If the target is mapped to zoned block device(s), check
 322          * that the zones are not partially mapped.
 323          */
 324         if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {

> Another approach worth exploring is stacking dm-zoned as is on top of a
> modified dm-linear with the ability to emulate conventional zones on top
> of a regular block device (you only need report zones method
> implemented). That is much easier to do. We actually hacked something
> like that last month to see the performance change and saw a jump from
> 56 MB/s to 250 MB/s for write intensive workloads (file creation on
> ext4). I am not so warm in this approach though as it modifies dm-linear
> and we want to keep it very lean and simple. Modifying dm-zoned to allow
> support for a device pair is a better approach I think.
> 
>>
>> Signed-off-by: Bob Liu <bob.liu@oracle.com>
>> ---
>>  drivers/md/dm-zoned-metadata.c |  6 +++---
>>  drivers/md/dm-zoned-target.c   | 15 +++++++++++++--
>>  drivers/md/dm-zoned.h          |  1 +
>>  3 files changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-metadata.c
>> index 22b3cb0..89cd554 100644
>> --- a/drivers/md/dm-zoned-metadata.c
>> +++ b/drivers/md/dm-zoned-metadata.c
>> @@ -439,7 +439,7 @@ static struct dmz_mblock *dmz_get_mblock_slow(struct dmz_metadata *zmd,
>>  
>>  	/* Submit read BIO */
>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>> -	bio_set_dev(bio, zmd->dev->bdev);
>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>  	bio->bi_private = mblk;
>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>>  	bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO);
>> @@ -593,7 +593,7 @@ static int dmz_write_mblock(struct dmz_metadata *zmd, struct dmz_mblock *mblk,
>>  	set_bit(DMZ_META_WRITING, &mblk->state);
>>  
>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>> -	bio_set_dev(bio, zmd->dev->bdev);
>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>  	bio->bi_private = mblk;
>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>>  	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
>> @@ -620,7 +620,7 @@ static int dmz_rdwr_block(struct dmz_metadata *zmd, int op, sector_t block,
>>  		return -ENOMEM;
>>  
>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>> -	bio_set_dev(bio, zmd->dev->bdev);
>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>  	bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO);
>>  	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
>>  	ret = submit_bio_wait(bio);
>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>> index 70a1063..55c64fe 100644
>> --- a/drivers/md/dm-zoned-target.c
>> +++ b/drivers/md/dm-zoned-target.c
>> @@ -39,6 +39,7 @@ struct dm_chunk_work {
>>   */
>>  struct dmz_target {
>>  	struct dm_dev		*ddev;
>> +	struct dm_dev           *metadata_dev;
> 
> This naming would be confusing as it implies metadata only. If you also
> want to move the random write zones to a regular device, then I would
> suggest names like:
> 
> ddev -> zoned_bdev (the zoned device, e.g. SMR disk)
> metadata_dev -> reg_bdev (regular block device, e.g. an SSD)
> 

Will update.

> The metadata+random write (fake) zones space can use the regular block
> device, and all sequential zones are assumed to be on the zoned device.
> Care must be taken to handle the case of a zoned device that has
> conventional zones: these could be used as is, not needing any reclaim,

Agree, that won't make things too complicate.
Thank you for all the suggestions.

Regards,
Bob

> so potentially contributing to further optimization.
> 
>>  
>>  	unsigned long		flags;
>>  
>> @@ -701,6 +702,7 @@ static int dmz_get_zoned_device(struct dm_target *ti, char *path)
>>  		goto err;
>>  	}
>>  
>> +	dev->meta_bdev = dmz->metadata_dev->bdev;
>>  	dev->bdev = dmz->ddev->bdev;
>>  	(void)bdevname(dev->bdev, dev->name);
>>  
>> @@ -761,7 +763,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>>  	int ret;
>>  
>>  	/* Check arguments */
>> -	if (argc != 1) {
>> +	if (argc != 2) {
>>  		ti->error = "Invalid argument count";
>>  		return -EINVAL;
>>  	}
>> @@ -774,8 +776,16 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>>  	}
>>  	ti->private = dmz;
>>  
>> +	/* Get the metadata block device */
>> +	ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
>> +			&dmz->metadata_dev);
>> +	if (ret) {
>> +		dmz->metadata_dev = NULL;
>> +		goto err;
>> +	}
>> +
>>  	/* Get the target zoned block device */
>> -	ret = dmz_get_zoned_device(ti, argv[0]);
>> +	ret = dmz_get_zoned_device(ti, argv[1]);
>>  	if (ret) {
>>  		dmz->ddev = NULL;
>>  		goto err;
>> @@ -856,6 +866,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>>  err_dev:
>>  	dmz_put_zoned_device(ti);
>>  err:
>> +	dm_put_device(ti, dmz->metadata_dev);
>>  	kfree(dmz);
>>  
>>  	return ret;
>> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
>> index 5b5e493..e789ff5 100644
>> --- a/drivers/md/dm-zoned.h
>> +++ b/drivers/md/dm-zoned.h
>> @@ -50,6 +50,7 @@
>>   */
>>  struct dmz_dev {
>>  	struct block_device	*bdev;
>> +	struct block_device     *meta_bdev;
>>  
>>  	char			name[BDEVNAME_SIZE];
>>  
>>


  reply	other threads:[~2020-01-08 13:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08  7:12 [RFC PATCH] dm-zoned: extend the way of exposing zoned block device Nobody
2020-01-08  7:40 ` Damien Le Moal
2020-01-08 13:46   ` Bob Liu [this message]
2020-01-08 22:46     ` Dmitry Fomichev
2020-01-09  0:57       ` Bob Liu
2020-01-09  2:05     ` Damien Le Moal
2020-02-03 12:45   ` Bob Liu
2020-02-03 15:06     ` Damien Le Moal
2020-02-04  3:57       ` Bob Liu
2020-02-04  8:31         ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e7d2f84-6efa-7c44-2313-860d5e8ed067@oracle.com \
    --to=bob.liu@oracle.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Dmitry.Fomichev@wdc.com \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=shirley.ma@oracle.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).