All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: linux-fsdevel@vger.kernel.org
Subject: Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives
Date: Tue, 11 Feb 2014 13:43:43 -0500	[thread overview]
Message-ID: <20140211184343.GA11971@thunk.org> (raw)
In-Reply-To: <nsxtxckhfsh.fsf@closure.thunk.org>

Based on the comments raised on the list, here is a revised version of
the proposed ZBC kernel interface.

Changes from the last version:

1)  Aligned ZBC_FLAG values to be aligned with the ZBC specification to
	simplify implementations
2)  Aligned the free_sector_criteria values to be mostly aligned with the ZBC
	specification
3)  Clarified the behaviour of blkdev_query_zones()
4)  Added an ioctl interface to expose this functionality to userspace
5)  Removed the proposed simplified data variant

Please let me know what you think!

						- Ted


/*
 * Note: this structure is 24 bytes.  Using 256 MB zones, an 8TB drive
 * will have 32,768 zones.   That means if we tried to use a contiguous
 * array we would need to allocate 768k of contiguous, non-swappable
 * kernel memory.  (Boo, hiss.) 
 *
 * This large enough that it would be painful to hang an array off the
 * block_device structure.  So we will define a function
 * blkdev_query_zones() to selectively return information for some
 * number of zones.
 *
 * It is anticipated that the block device driver will store this
 * information in a compressed form, and that z_checkpoint_offset will
 * not be dynamically tracked.  That is, the checkpoint offset will,
 * if non-zero, indicates that drive suffered a power fail event, and
 * the file system or userspace process may need to implement recovery
 * procedures.  Once the file system or userspace process writes to an
 * SMR band, the checkpoint offset will be cleared and future queries
 * for the SMR band will return the checkpoint offset == write_ptr.
 */
struct zone_status {
       sector_t	z_start;
       __u32	z_length;
       __u32	z_write_ptr_offset;  /* offset */
       __u32	z_checkpoint_offset; /* offset */
       __u32	z_flags;	     /* full, ro, offline, reset_requested */
};

#define Z_FLAG_RESET_REQUESTED	0x0001
#define Z_FLAGS_OFFLINE		0x0002
#define Z_FLAGS_RO		0x0004
#define Z_FLAGS_FULL		0x0008

#define Z_FLAG_TYPE_MASK	0x0F00
#define Z_FLAG_TYPE_CONVENTIONAL 0x0100
#define Z_FLAG_TYPE_SEQUENTIAL	0x0200


/*
 * Query the block_device bdev for information about the zones
 * starting at start_sector that match the criteria specified by
 * free_sectors_criteria.  Zone status information for at most
 * max_zones will be placed into the memory array ret_zones (which is
 * allocated by the caller, not by the blkdev_query_zones function),
 * in ascending LBA order.  The return value will be a kernel error
 * code if negative, or the number of zones actually returned if
 * non-nonegative.
 *
 * If free_sectors_criteria is positive, then return zones that have
 * at least that many sectors available to be written.  If it is zero,
 * then match all zones.  If free_sectors_criteria is negative, then
 * return the zones that match the following criteria:
 *
 *	-1     Match all full zones
 *	-2     Match all open zones
 *		  (the zone has at least one written sector and is not full)
 *	-3     Match all free zones
 *		  (the zone has no written sectors)
 *      -4     Match all read-only zones
 *      -5     Match all offline zones
 *      -6     Match all zones where the write ptr != the checkpoint ptr
 *
 * The negative values are taken from Table 4 of 14-010r1, with the
 * exception of -6, which is not in the draft spec --- but IMHO should
 * be :-) It is anticipated, though, that the kernel will keep this
 * info in in memory and so will handle matching zones which meet
 * these criteria itself, without needing to issue a ZBC command for
 * each call to blkdev_query_zones().
 */
extern int blkdev_query_zones(struct block_device *bdev,
			      sector_t start_sector,
			      int free_sectors_criteria,
			      int max_zones,
       			      struct zone_status *ret_zones);

/*
 * Reset the write pointer for a sequential write zone.
 *
 * Returns -EINVAL if the start_sector is not the beginning of a
 * sequential write zone.
 */
extern int blkdev_reset_zone_ptr(struct block_dev *bdev,
				 sector_t start_sector);


/* ioctl interface */

ZBCQUERY
	u64 starting_lba	/* IN */
	u32 criteria		/* IN */
	u32 *num_zones		/* IN/OUT */
	struct zone_status *ptr	/* OUT */

ZBCRESETZONE
	u64 starting_lba



  parent reply	other threads:[~2014-02-11 18:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-31  5:38 [RFC] Draft Linux kernel interfaces for ZBC drives Theodore Ts'o
2014-01-31 13:07 ` Matthew Wilcox
2014-01-31 15:44   ` Theodore Ts'o
2014-02-03 21:01 ` Jeff Moyer
2014-02-03 21:07   ` Martin K. Petersen
2014-02-03 21:38   ` Theodore Ts'o
2014-02-03 22:26     ` Jeff Moyer
2014-02-03 21:03 ` Eric Sandeen
2014-02-03 22:17   ` Theodore Ts'o
2014-02-04  2:00 ` HanBin Yoon
2014-02-04 16:27   ` Theodore Ts'o
2014-02-11 18:43 ` Theodore Ts'o [this message]
2014-02-11 19:04   ` [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Andreas Dilger
2014-02-11 19:53     ` Theodore Ts'o
2014-02-13  2:08       ` Andreas Dilger
2014-02-13  3:09         ` Theodore Ts'o
2014-02-21 10:02 ` [RFC] Draft Linux kernel interfaces for ZBC drives Rohan Puri
2014-02-21 15:49   ` Theodore Ts'o
2014-02-25  9:36     ` Rohan Puri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211184343.GA11971@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.