From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Dilger Subject: Re: [RFC] Draft Linux kernel interfaces for SMR/ZBC drives Date: Tue, 11 Feb 2014 12:04:12 -0700 Message-ID: <1E60E599-9CF1-4DD8-AE19-B1073858629D@dilger.ca> References: <20140211184343.GA11971@thunk.org> Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Content-Type: multipart/signed; boundary="Apple-Mail=_FABA649B-405A-4084-9407-936B84070E73"; protocol="application/pgp-signature"; micalg=pgp-sha1 Cc: linux-fsdevel To: Theodore Ts'o Return-path: Received: from mail-pa0-f53.google.com ([209.85.220.53]:47445 "EHLO mail-pa0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751381AbaBKTES (ORCPT ); Tue, 11 Feb 2014 14:04:18 -0500 Received: by mail-pa0-f53.google.com with SMTP id lj1so8069322pab.12 for ; Tue, 11 Feb 2014 11:04:18 -0800 (PST) In-Reply-To: <20140211184343.GA11971@thunk.org> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --Apple-Mail=_FABA649B-405A-4084-9407-936B84070E73 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Feb 11, 2014, at 11:43 AM, Theodore Ts'o wrote: > Based on the comments raised on the list, here is a revised version of > the proposed ZBC kernel interface. >=20 > Changes from the last version: >=20 > 1) Aligned ZBC_FLAG values to be aligned with the ZBC specification = to > simplify implementations > 2) Aligned the free_sector_criteria values to be mostly aligned with = the ZBC > specification > 3) Clarified the behaviour of blkdev_query_zones() > 4) Added an ioctl interface to expose this functionality to userspace > 5) Removed the proposed simplified data variant >=20 > Please let me know what you think! Should ZBCRESETZONE take a length or number of zones to reset? Cheers, Andreas > /* > * Note: this structure is 24 bytes. Using 256 MB zones, an 8TB drive > * will have 32,768 zones. That means if we tried to use a contiguous > * array we would need to allocate 768k of contiguous, non-swappable > * kernel memory. (Boo, hiss.)=20 > * > * This large enough that it would be painful to hang an array off the > * block_device structure. So we will define a function > * blkdev_query_zones() to selectively return information for some > * number of zones. > * > * It is anticipated that the block device driver will store this > * information in a compressed form, and that z_checkpoint_offset will > * not be dynamically tracked. That is, the checkpoint offset will, > * if non-zero, indicates that drive suffered a power fail event, and > * the file system or userspace process may need to implement recovery > * procedures. Once the file system or userspace process writes to an > * SMR band, the checkpoint offset will be cleared and future queries > * for the SMR band will return the checkpoint offset =3D=3D write_ptr. > */ > struct zone_status { > sector_t z_start; > __u32 z_length; > __u32 z_write_ptr_offset; /* offset */ > __u32 z_checkpoint_offset; /* offset */ > __u32 z_flags; /* full, ro, offline, = reset_requested */ > }; >=20 > #define Z_FLAG_RESET_REQUESTED 0x0001 > #define Z_FLAGS_OFFLINE 0x0002 > #define Z_FLAGS_RO 0x0004 > #define Z_FLAGS_FULL 0x0008 >=20 > #define Z_FLAG_TYPE_MASK 0x0F00 > #define Z_FLAG_TYPE_CONVENTIONAL 0x0100 > #define Z_FLAG_TYPE_SEQUENTIAL 0x0200 >=20 >=20 > /* > * Query the block_device bdev for information about the zones > * starting at start_sector that match the criteria specified by > * free_sectors_criteria. Zone status information for at most > * max_zones will be placed into the memory array ret_zones (which is > * allocated by the caller, not by the blkdev_query_zones function), > * in ascending LBA order. The return value will be a kernel error > * code if negative, or the number of zones actually returned if > * non-nonegative. > * > * If free_sectors_criteria is positive, then return zones that have > * at least that many sectors available to be written. If it is zero, > * then match all zones. If free_sectors_criteria is negative, then > * return the zones that match the following criteria: > * > * -1 Match all full zones > * -2 Match all open zones > * (the zone has at least one written sector and is not = full) > * -3 Match all free zones > * (the zone has no written sectors) > * -4 Match all read-only zones > * -5 Match all offline zones > * -6 Match all zones where the write ptr !=3D the checkpoint = ptr > * > * The negative values are taken from Table 4 of 14-010r1, with the > * exception of -6, which is not in the draft spec --- but IMHO should > * be :-) It is anticipated, though, that the kernel will keep this > * info in in memory and so will handle matching zones which meet > * these criteria itself, without needing to issue a ZBC command for > * each call to blkdev_query_zones(). > */ > extern int blkdev_query_zones(struct block_device *bdev, > sector_t start_sector, > int free_sectors_criteria, > int max_zones, > struct zone_status *ret_zones); >=20 > /* > * Reset the write pointer for a sequential write zone. > * > * Returns -EINVAL if the start_sector is not the beginning of a > * sequential write zone. > */ > extern int blkdev_reset_zone_ptr(struct block_dev *bdev, > sector_t start_sector); >=20 >=20 > /* ioctl interface */ >=20 > ZBCQUERY > u64 starting_lba /* IN */ > u32 criteria /* IN */ > u32 *num_zones /* IN/OUT */ > struct zone_status *ptr /* OUT */ >=20 > ZBCRESETZONE > u64 starting_lba >=20 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe = linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas --Apple-Mail=_FABA649B-405A-4084-9407-936B84070E73 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIVAwUBUvp0LHKl2rkXzB/gAQKaTxAAqiTYthDn3CP1kKHm+lnNcMf/NesnWoi9 jwzhXHEUaTWiGigMUyTfWnE8yMH7DxzXrPXOKVIO4XcbFd33I4C7PzE/6gNYpFjV Osugf8xmm0b6vlMGyVkbtBWzHmjfs/bvWOHvuX0LMxx0PInloxUHhBOVvm3+tGb/ tDY87XkMg1YDrJdT08kGfY0Fi9JsTC6Vtafp3Zss3qnwmBUUrLBSYgY5Leituy7O 5p9KAVfryKa4zLCsHwVYl8aiXyrgd5XCCaKrdqlpr3fWUkg4P5C6PqBFwOGJX9uF cOv/E51VfrjZ0bzpq2b91N7pXJMNMmnmaGqYOcWD8bv6m/cru/Qxy4dVdbJTAN4r s2oxgatYBuKwCfk3YfdicS0XNn/3RUtYaXcLpz/wmhIKXEcPgDz+Tmtz8/BUeC2L jdBM+ondlvPMnWXPMQeSbb9jSpry7FC6Su48VHUbAzm/cVICKgPqzlGG9Eik20n8 MZc3P769Oq9Gro9opB3wuf4X5sM4jgba5xVsTNhQnskEmBJDoX3ieV0Q7hMQFm7R NRHco+kVk8PvWlEzbxk1GTHLcytOb0zeR8k7SGPLvafliKcZnY/hZvlE1I2Gkbvv JCgO97M04Fp7ClO4iLfITr4CmYUvLFa5NYy94qWy4MgXk1laYiLt3p+vNuEt9o9u Km68jJklRWQ= =FJHc -----END PGP SIGNATURE----- --Apple-Mail=_FABA649B-405A-4084-9407-936B84070E73--