On Feb 11, 2014, at 11:43 AM, Theodore Ts'o wrote: > Based on the comments raised on the list, here is a revised version of > the proposed ZBC kernel interface. > > Changes from the last version: > > 1) Aligned ZBC_FLAG values to be aligned with the ZBC specification to > simplify implementations > 2) Aligned the free_sector_criteria values to be mostly aligned with the ZBC > specification > 3) Clarified the behaviour of blkdev_query_zones() > 4) Added an ioctl interface to expose this functionality to userspace > 5) Removed the proposed simplified data variant > > Please let me know what you think! Should ZBCRESETZONE take a length or number of zones to reset? Cheers, Andreas > /* > * Note: this structure is 24 bytes. Using 256 MB zones, an 8TB drive > * will have 32,768 zones. That means if we tried to use a contiguous > * array we would need to allocate 768k of contiguous, non-swappable > * kernel memory. (Boo, hiss.) > * > * This large enough that it would be painful to hang an array off the > * block_device structure. So we will define a function > * blkdev_query_zones() to selectively return information for some > * number of zones. > * > * It is anticipated that the block device driver will store this > * information in a compressed form, and that z_checkpoint_offset will > * not be dynamically tracked. That is, the checkpoint offset will, > * if non-zero, indicates that drive suffered a power fail event, and > * the file system or userspace process may need to implement recovery > * procedures. Once the file system or userspace process writes to an > * SMR band, the checkpoint offset will be cleared and future queries > * for the SMR band will return the checkpoint offset == write_ptr. > */ > struct zone_status { > sector_t z_start; > __u32 z_length; > __u32 z_write_ptr_offset; /* offset */ > __u32 z_checkpoint_offset; /* offset */ > __u32 z_flags; /* full, ro, offline, reset_requested */ > }; > > #define Z_FLAG_RESET_REQUESTED 0x0001 > #define Z_FLAGS_OFFLINE 0x0002 > #define Z_FLAGS_RO 0x0004 > #define Z_FLAGS_FULL 0x0008 > > #define Z_FLAG_TYPE_MASK 0x0F00 > #define Z_FLAG_TYPE_CONVENTIONAL 0x0100 > #define Z_FLAG_TYPE_SEQUENTIAL 0x0200 > > > /* > * Query the block_device bdev for information about the zones > * starting at start_sector that match the criteria specified by > * free_sectors_criteria. Zone status information for at most > * max_zones will be placed into the memory array ret_zones (which is > * allocated by the caller, not by the blkdev_query_zones function), > * in ascending LBA order. The return value will be a kernel error > * code if negative, or the number of zones actually returned if > * non-nonegative. > * > * If free_sectors_criteria is positive, then return zones that have > * at least that many sectors available to be written. If it is zero, > * then match all zones. If free_sectors_criteria is negative, then > * return the zones that match the following criteria: > * > * -1 Match all full zones > * -2 Match all open zones > * (the zone has at least one written sector and is not full) > * -3 Match all free zones > * (the zone has no written sectors) > * -4 Match all read-only zones > * -5 Match all offline zones > * -6 Match all zones where the write ptr != the checkpoint ptr > * > * The negative values are taken from Table 4 of 14-010r1, with the > * exception of -6, which is not in the draft spec --- but IMHO should > * be :-) It is anticipated, though, that the kernel will keep this > * info in in memory and so will handle matching zones which meet > * these criteria itself, without needing to issue a ZBC command for > * each call to blkdev_query_zones(). > */ > extern int blkdev_query_zones(struct block_device *bdev, > sector_t start_sector, > int free_sectors_criteria, > int max_zones, > struct zone_status *ret_zones); > > /* > * Reset the write pointer for a sequential write zone. > * > * Returns -EINVAL if the start_sector is not the beginning of a > * sequential write zone. > */ > extern int blkdev_reset_zone_ptr(struct block_dev *bdev, > sector_t start_sector); > > > /* ioctl interface */ > > ZBCQUERY > u64 starting_lba /* IN */ > u32 criteria /* IN */ > u32 *num_zones /* IN/OUT */ > struct zone_status *ptr /* OUT */ > > ZBCRESETZONE > u64 starting_lba > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas