Sorry for resend, I revised a little. Hi All, I've worked on extended LBA in bdevs first. I will do T10 PI on top of the extended LBA next. I expect some applications or users will need separate metadata and will do this too. About extended LBA in bdevs, I would like to hear any feedback before submitting patches. Any feedback is very very appreciated. Q1. Which length should spdk_bdev_get_block_size(bdev) describe? Option 1: length of extended block (data + metadata) Option 2: length of only data block. user will get length of metadata by spdk_bdev_get_md_size(bdev) Or any other idea? Current implementation is Option 1 but NVMe-oF target cuts off the size of metadata now even if metadata is enabled. Keeping current implementation, Option1, sounds reasonable for me. Even if we take Option 1, we will need spdk_bdev_get_md_size(bdev) anyway. Q2. Which behavior should bdev IO APIs have by default? Option 1: READ_PASS and WRITE_PASS (the upper layer must be aware of extended LBA by default) Option 2: READ_STRIP and WRITE_INSERT (extended LBA is transparent to the upper layer by default) Or any other idea? READ_STRIP reads data and metadata from the target, discards metadata, and transfers only data to the upper layer. WRITE_INSERT transfers only data from the upper layer, add metadata, and writes both data and metadata to the target. READ_PASS reads data and metadata from the target and transfers both data and metadata to the upper layer. WRITE_PASS transfers data and metadata from the upper layer and writes both data and metadata to the target. Option 1 looks reasonable to me. I will be able to provide an new bdev module to use extended LBA bdevs transparently. The new bdev module will do READ_STRIP and WRITE_INSERT internally. If we take Option 2, we will have to provide the set of ELBA aware APIs. ELBA awareness means that the upper layer must extract data from interleaved extended blocks. Q3. To support any application which requires separate metadata, providing the following APIs is reasonable? - add md_buf and md_len to parameters - add the suffix "_md" to the function name. About T10 PI, I will send questions as separate mail later. Bdev provides the following APIs: - int spdk_bdev_read(desc, ch, buf, offset, nbytes, cb, cb_arg) - int spdk_bdev_readv(desc, ch, iov, iovcnt, offset, nbytes, cb, cb_arg) - int spdk_bdev_write(desc, ch, buf, offset, nbytes, cb, cb_arg) - int spdk_bdev_writev(desc, ch, iov, iovcnt, offset, nbytes, cb, cb_arg) - int spdk_bdev_write_zeroes(desc, ch, offset, len, cb, cb_arg) - int spdk_bdev_unmap(desc, ch, offset, nbytes, cb, cb_arg) - int spdk_bdev_reset(desc, ch, cb, cb_arg) - int spdk_bdev_flush(desc, ch, offset, length, cb, cb_arg) - int spdk_bdev_nvme_admin_passthru(desc, ch, cmd, buf, nbytes, cb, cb_arg) - int spdk_bdev_nvme_io_passthru(bdev_desc, ch, cmd, buf, nbytes, bc, cb_arg) - int spdk_bdev_nvme_io_passthru_md(bdev_desc, ch, cmd, buf, nbytes, md_buf, md_len, cb, cb_arg) - uint32_t spdk_bdev_get_blocks_size(bdev) - uint32_t spdk_bdev_get_num_blocks(bdev) - int spdk_bdev_read_blocks(desc, ch, buf, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_readv_blocks(desc, ch, iov, iovcnt, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_write_blocks(desc, ch, buf, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_writev_blocks(desc, ch, iov, iovcnt, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_write_zeroes_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_unmap_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_flush_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) Thanks, Shuhei ________________________________ 差出人: 松本周平 / MATSUMOTO,SHUUHEI 送信日時: 2018年9月25日 9:14 宛先: spdk(a)lists.01.org 件名: Design policy to support extended LBA and T10 PI in bdevs. Hi All, I've worked on extended LBA in bdevs first. I will do T10 PI on top of the extended LBA next. I expect some applications or users will need separate metadata and will do this too. About extended LBA in bdevs, I would like to hear any feedback before submitting patches. Any feedback is very very appreciated. Q1. Which length should spdk_bdev_get_block_size(bdev) describe? Option 1: length of extended block (data + metadata) Option 2: length of only data block. user will get length of metadata by spdk_bdev_get_md_size(bdev) Or any other idea? Current implementation is A1 but NVMe-oF target cuts off the size of metadata now even if metadata is enabled. Keeping current implementation, Option1, sounds reasonable for me. Even if we take Option 1 Q2. Which behavior should bdev IO APIs have by default? Option 1: READ_PASS and WRITE_PASS (the upper layer must be aware of extended LBA by default) Option 2: READ_STRIP and WRITE_INSERT (extended LBA is transparent to the upper layer by default) Or any other idea? READ_STRIP reads data and metadata from the target, discards metadata, and transfers only data to the upper layer. WRITE_INSERT transfers only data from the upper layer, add metadata, and writes both data and metadata to the target. READ_PASS reads data and metadata from the target and transfers both data and metadata to the upper layer. WRITE_PASS transfers data and metadata from the upper layer and writes both data and metadata to the target. A1 looks reasonable to me. I will be able to provide an new bdev module to use extended LBA bdevs transparently. The new bdev module will do READ_STRIP and WRITE_INSERT internally. If we take A2, we will have to provide the set of ELBA aware APIs. Q3. To support any application which requires separate metadata, providing the following APIs is reasonable? - add md_buf and md_len to parameters - add the suffix "_md" to the function name. About T10 PI, I will send questions as separate mail later. Bdev provides the following APIs: - int spdk_bdev_read(desc, ch, buf, offset, nbytes, cb, cb_arg) - int spdk_bdev_readv(desc, ch, iov, iovcnt, offset, nbytes, cb, cb_arg) - int spdk_bdev_write(desc, ch, buf, offset, nbytes, cb, cb_arg) - int spdk_bdev_writev(desc, ch, iov, iovcnt, offset, nbytes, cb, cb_arg) - int spdk_bdev_write_zeroes(desc, ch, offset, len, cb, cb_arg) - int spdk_bdev_unmap(desc, ch, offset, nbytes, cb, cb_arg) - int spdk_bdev_reset(desc, ch, cb, cb_arg) - int spdk_bdev_flush(desc, ch, offset, length, cb, cb_arg) - int spdk_bdev_nvme_admin_passthru(desc, ch, cmd, buf, nbytes, cb, cb_arg) - int spdk_bdev_nvme_io_passthru(bdev_desc, ch, cmd, buf, nbytes, bc, cb_arg) - int spdk_bdev_nvme_io_passthru_md(bdev_desc, ch, cmd, buf, nbytes, md_buf, md_len, cb, cb_arg) - uint32_t spdk_bdev_get_blocks_size(bdev) - uint32_t spdk_bdev_get_num_blocks(bdev) - int spdk_bdev_read_blocks(desc, ch, buf, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_readv_blocks(desc, ch, iov, iovcnt, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_write_blocks(desc, ch, buf, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_writev_blocks(desc, ch, iov, iovcnt, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_write_zeroes_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_unmap_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) - int spdk_bdev_flush_blocks(desc, ch, offset_blocks, num_blocks, cb, cb_arg) Thanks, Shuhei