On Fri, Jun 26, 2020 at 03:11:55AM +0000, Damien Le Moal wrote: >On 2020/06/26 2:18, Kanchan Joshi wrote: >> Semantics ---> >> Zone-append, by its nature, may perform write on a different location than what >> was specified. It does not fit into POSIX, and trying to fit may just undermine >> its benefit. It may be better to keep semantics as close to zone-append as >> possible i.e. specify zone-start location, and obtain the actual-write location >> post completion. Towards that goal, existing async APIs seem to fit fine. >> Async APIs (uring, linux aio) do not work on implicit write-pointer and demand >> explicit write offset (which is what we need for append). Neither write-pointer > >What do you mean by "implicit write pointer" ? Are you referring to the behavior >of AIO write with a block device file open with O_APPEND ? The yes, it does not >work. But that is perfectly fine for regular files, that is for zonefs. Sorry, I meant file pointer. Yes, block-device opened with O_APPEND does not increase the file-pointer to end-of-device. That said, for uring and aio, file-pointer position plays no role, and it is application responsibility to pass the right write location. >I would prefer that this paragraph simply state the semantic that is implemented >first. Then explain why the choice. But first, clarify how the API works, what >is allowed, what's not etc. That will also simplify reviewing the code as one >can then check the code against the goal. In this path (block IO) there is hardly any scope/attempt to abstract away anything. So raw zoned-storage rule/semantics apply. I expect zone-aware applications, which are already aware of rules, to be consumer of this. >> is taken as input, nor it is updated on completion. And there is a clear way to >> get zone-append result. Zone-aware applications while using these async APIs >> can be fine with, for the lack of better word, zone-append semantics itself. >> >> Sync APIs work with implicit write-pointer (at least few of those), and there is >> no way to obtain zone-append result, making it hard for user-space zone-append. > >Sync API are executed under inode lock, at least for regular files. So there is >absolutely no problem to use zone append. zonefs does it already. The problem is >the lack of locking for block device file. Yes. I was refering to the problem of returning actual write-location using sync APIs like write, pwrite, pwritev/v2. >> >> Tests ---> >> Using new interface in fio (uring and libaio engine) by extending zbd tests >> for zone-append: https://protect2.fireeye.com/url?k=e21dd5e0-bf837b7a-e21c5eaf-0cc47a336fae-c982437ed1be6cc8&q=1&u=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fpull%2F1026 >> >> Changes since v1: >> - No new opcodes in uring or aio. Use RWF_ZONE_APPEND flag instead. >> - linux-aio changes vanish because of no new opcode >> - Fixed the overflow and other issues mentioned by Damien >> - Simplified uring support code, fixed the issues mentioned by Pavel >> - Added error checks >> >> Kanchan Joshi (1): >> fs,block: Introduce RWF_ZONE_APPEND and handling in direct IO path >> >> Selvakumar S (1): >> io_uring: add support for zone-append >> >> fs/block_dev.c | 28 ++++++++++++++++++++++++---- >> fs/io_uring.c | 32 ++++++++++++++++++++++++++++++-- >> include/linux/fs.h | 9 +++++++++ >> include/uapi/linux/fs.h | 5 ++++- >> 4 files changed, 67 insertions(+), 7 deletions(-) >> > > >-- >Damien Le Moal >Western Digital Research >