All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: linux-fsdevel@vger.kernel.org
Subject: [RFC] Draft Linux kernel interfaces for ZBC drives
Date: Fri, 31 Jan 2014 00:38:22 -0500	[thread overview]
Message-ID: <nsxtxckhfsh.fsf@closure.thunk.org> (raw)


I've been reading the draft ZBC specifications, especially 14-010r1[1],
and I've created the following draft kernel interfaces, which I present
as a strawman proposal for comments.

[1] http://www.t10.org/cgi-bin/ac.pl?t=d&f=14-010r1.pdf

As noted in the comments below, supporting variable length SMR zones
does result in more complexity at the file system / userspace interface
layer.  Life would certainly get simpler if these zones were fixed
length.

                                                        - Ted


/*
 * Note: this structure is 24 bytes.  Using 256 MB zones, an 8TB drive
 * will have 32,768 zones.   That means if we tried to use a contiguous
 * array we would need to allocate 768k of contiguous, non-swappable
 * kernel memory.  (Boo, hiss.) 
 *
 * This large enough that it would be painful to hang an array off the
 * block_device structure.  So we will define a function
 * blkdev_query_zones() to selectively return information for some
 * number of zones.
 */
struct zone_status {
       sector_t	z_start;
       __u32	z_length;
       __u32	z_write_ptr_offset;  /* offset */
       __u32	z_checkpoint_offset; /* offset */
       __u32	z_flags;	     /* full, ro, offline, reset_requested */
};

#define Z_FLAGS_FULL		0x0001
#define Z_FLAGS_OFFLINE		0x0002
#define Z_FLAGS_RO		0x0004
#define Z_FLAG_RESET_REQUESTED	0x0008

#define Z_FLAG_TYPE_MASK	0x0F00
#define Z_FLAG_TYPE_CONVENTIONAL 0x0000
#define Z_FLAG_TYPE_SEQUENTIAL	0x0100


/*
 * Query the block_device bdev for information about the zones
 * starting at start_sector that match the criteria specified by
 * free_sectors_criteria.  Zone status information for at most
 * max_zones will be placed into the memory array ret_zones.  The
 * return value contains the number of zones actually returned.
 *
 * If free_sectors_criteria is positive, then return zones that have
 * at least that many sectors available to be written.  If it is zero,
 * then match all zones.  If free_sectors_criteria is negative, then
 * return the zones that match the following criteria:
 *
 *      -1     Return all read-only zones
 *      -2     Return all offline zones
 *      -3     Return all zones where the write ptr != the checkpoint ptr
 */
extern int blkdev_query_zones(struct block_device *bdev,
			      sector_t start_sector,
			      int free_sectors_criteria,
       			      struct zone_status *ret_zones,
			      int max_zones);

/*
 * Reset the write pointer for a sequential write zone.
 *
 * Returns -EINVAL if the start_sector is not the beginning of a
 * sequential write zone.
 */
extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
				 sector_t start_sector);


/* 
 * ----------------------------
 */

/* 
 * The zone_status structure could be a lot smaller if zones are a
 * constant fixed size, then we could address zones using an 16 bit
 * integer, instead of using a 64-bit starting lba number then this
 * structure could half the size (12 bytes).
 *
 * We can also further shrink the structure by removing the
 * z_checkpoint_offset element, since most of the time
 * z_write_ptr_offset and z_checkpoint_offset will be the same.  The
 * only time they will be different is after a write is interrupted
 * via an unexpected power removal
 * 
 * With the smaller structure, we could fit all of the zones in an 8TB
 * SMR drive in 256k, which maybe we could afford to vmalloc()
 */
struct simplified_zone_status {
       __u32	z_write_ptr_offset;  /* offset */
       __u32	z_flags;
};

/* add a new flag */
#define Z_FLAG_POWER_FAIL_WRITE 0x0010 /* write_ptr != checkpoint ptr */

             reply	other threads:[~2014-01-31  5:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-31  5:38 Theodore Ts'o [this message]
2014-01-31 13:07 ` [RFC] Draft Linux kernel interfaces for ZBC drives Matthew Wilcox
2014-01-31 15:44   ` Theodore Ts'o
2014-02-03 21:01 ` Jeff Moyer
2014-02-03 21:07   ` Martin K. Petersen
2014-02-03 21:38   ` Theodore Ts'o
2014-02-03 22:26     ` Jeff Moyer
2014-02-03 21:03 ` Eric Sandeen
2014-02-03 22:17   ` Theodore Ts'o
2014-02-04  2:00 ` HanBin Yoon
2014-02-04 16:27   ` Theodore Ts'o
2014-02-11 18:43 ` [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Theodore Ts'o
2014-02-11 19:04   ` Andreas Dilger
2014-02-11 19:53     ` Theodore Ts'o
2014-02-13  2:08       ` Andreas Dilger
2014-02-13  3:09         ` Theodore Ts'o
2014-02-21 10:02 ` [RFC] Draft Linux kernel interfaces for ZBC drives Rohan Puri
2014-02-21 15:49   ` Theodore Ts'o
2014-02-25  9:36     ` Rohan Puri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nsxtxckhfsh.fsf@closure.thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.