linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
       [not found] <CABmKtjfdDS-iO+jLkwt7x-oDHt9V1p-cpYHjL5EV2NKwHxqN1Q@mail.gmail.com>
@ 2020-12-16 20:35 ` Palmer Dabbelt
  0 siblings, 0 replies; 8+ messages in thread
From: Palmer Dabbelt @ 2020-12-16 20:35 UTC (permalink / raw)
  To: ruby.wktk
  Cc: josef, linux-raid, bvanassche, snitzer, linux-doc, shuah, corbet,
	linux-kernel, Christoph Hellwig, song, dm-devel,
	michael.christie, linux-kselftest, kernel-team, agk

On Tue, 15 Dec 2020 22:17:06 PST (-0800), ruby.wktk@gmail.com wrote:
> Hi my name is Akira Hayakawa. I am maintaining an out-of-tree DM target
> named dm-writeboost.
>
> Sorry to step in. But this is a very interesting topic at least to me.
>
> I have been looking for something like dm-user because I believe we should
> be able to implement virtual block devices in Rust language.
>
> I know proxying IO requests to userland always causes some overhead but for
> some type of device that performance doesn't matter or some research
> prototyping or pseudo device for testing, this way should be developed. Of
> course, implementation in Rust will give us opportunities to develop more
> complicated software in high quality.
>
> I noticed this thread few days ago then I started to prototype this library
> https://github.com/akiradeveloper/userland-io
>
> It is what I want but the transport is still NBD which I don't like so
> much. If dm-user is available, I will implement a transport using dm-user.

Great, I'm glad to hear that.  Obviously this is still in the early days and
we're talking about high-level ABI design here, so things are almost certainly
going to change, but it's always good to have people pushing on stuff.

Just be warned: we've only had two people write userspaces for this (one of
which was me, and all that is test code) so I'd be shocked if you manage to
avoid running into bugs.

>
> - Akira
>
> On Tue, Dec 15, 2020 at 7:00 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>> On Thu, 10 Dec 2020 09:03:21 PST (-0800), josef@toxicpanda.com wrote:
>> > On 12/9/20 10:38 PM, Bart Van Assche wrote:
>> >> On 12/7/20 10:55 AM, Palmer Dabbelt wrote:
>> >>> All in all, I've found it a bit hard to figure out what sort of
>> interest
>> >>> people
>> >>> have in dm-user: when I bring this up I seem to run into people who've
>> done
>> >>> similar things before and are vaguely interested, but certainly nobody
>> is
>> >>> chomping at the bit.  I'm sending it out in this early state to try and
>> >>> figure
>> >>> out if it's interesting enough to keep going.
>> >>
>> >> Cc-ing Josef and Mike since their nbd contributions make me wonder
>> >> whether this new driver could be useful to their use cases?
>> >>
>> >
>> > Sorry gmail+imap sucks and I can't get my email client to get at the
>> original
>> > thread.  However here is my take.
>>
>> and I guess I then have to apoligize for missing your email ;).  Hopefully
>> that
>> was the problem, but who knows.
>>
>> > 1) The advantages of using dm-user of NBD that you listed aren't actually
>> > problems for NBD.  We have NBD working in production where you can hand
>> off the
>> > sockets for the server without ending in timeouts, it was actually the
>> main
>> > reason we wrote our own server so we could use the FD transfer stuff to
>> restart
>> > the server without impacting any clients that had the device in use.
>>
>> OK.  So you just send the FD around using one of the standard mechanisms to
>> orchestrate the handoff?  I guess that might work for our use case,
>> assuming
>> whatever the security side of things was doing was OK with the old FD.
>> TBH I'm
>> not sure how all that works and while we thought about doing that sort of
>> transfer scheme we decided to just open it again -- not sure how far we
>> were
>> down the dm-user rabbit hole at that point, though, as this sort of arose
>> out
>> of some other ideas.
>>
>> > 2) The extra copy is a big deal, in fact we already have too many copies
>> in our
>> > existing NBD setup and are actively looking for ways to avoid those.
>> >
>> > Don't take this as I don't think dm-user is a good idea, but I think at
>> the very
>> > least it should start with the very best we have to offer, starting with
>> as few
>> > copies as possible.
>>
>> I was really experting someone to say that.  It does seem kind of silly to
>> build
>> out the new interface, but not go all the way to a ring buffer.  We just
>> didn't
>> really have any way to justify the extra complexity as our use cases aren't
>> that high performance.   I kind of like to have benchmarks for this sort of
>> thing, though, and I didn't have anyone who had bothered avoiding the last
>> copy
>> to compare against.
>>
>> > If you are using it currently in production then cool, there's clearly a
>> usecase
>> > for it.  Personally as I get older and grouchier I want less things in
>> the
>> > kernel, so if this enables us to eventually do everything NBD related in
>> > userspace with no performance drop then I'd be down.  I don't think you
>> need to
>> > make that your primary goal, but at least polishing this up so it could
>> > potentially be abused in the future would make it more compelling for
>> merging.
>> > Thanks,
>>
>> Ya, it's in Android already and we'll be shipping it as part of the new OTA
>> flow for the next release.  The rules on deprecation are a bit different
>> over
>> there, though, so it's not like we're wed to it.  The whole point of
>> bringing
>> this up here was to try and get something usable by everyone, and while I'd
>> eventually like to get whatever's in Android into the kernel proper we'd
>> really
>> planned on supporting an extra Android-only ABI for a cycle at least.
>>
>> I'm kind of inclined to take a crack at the extra copy, to at least see if
>> building something that eliminates it is viable.  I'm not really sure if
>> it is
>> (or at least, if it'll net us a meaningful amount of performance), but
>> it'd at
>> least be interesting to try.
>>
>> It'd be nice to have some benchmark target, though, as otherwise this stuff
>> hangs on forever.  My workloads are in selftests later on in the patch
>> set, but
>> I'm essentially using tmpfs as a baseline to compare against ext4+dm-user
>> with
>> some FIO examples as workloads.  Our early benchmark numbers indicated
>> this was
>> way faster than we needed, so I didn't even bother putting together a
>> proper
>> system to run on so I don't really have any meaningful numbers there.  Is
>> there
>> an NBD server that's fast that I should be comparing against?
>>
>> I haven't gotten a whole lot of feedback, so I'm inclined to at least have
>> some
>> reasonable performance numbers before bothering with a v2.
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-22 13:32       ` Christoph Hellwig
@ 2020-12-22 20:31         ` Palmer Dabbelt
  0 siblings, 0 replies; 8+ messages in thread
From: Palmer Dabbelt @ 2020-12-22 20:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: josef, bvanassche, Christoph Hellwig, snitzer, corbet,
	kernel-team, linux-doc, linux-kernel, linux-raid, song, dm-devel,
	linux-kselftest, shuah, agk, michael.christie

On Tue, 22 Dec 2020 05:32:46 PST (-0800), Christoph Hellwig wrote:
> On Mon, Dec 14, 2020 at 07:00:57PM -0800, Palmer Dabbelt wrote:
>> I haven't gotten a whole lot of feedback, so I'm inclined to at least have some
>> reasonable performance numbers before bothering with a v2.
>
> FYI, my other main worry beside duplicating nbd is that device mapper
> really is a stacked interface that sits on top of other block device.
> Turning this into something else that just pipes data to userspace
> seems very strange.

Agreed.  It certainly doesn't fit the DM model.  We'd considered doing a non-DM
version of this (maybe "ubd"), but decided to stick with dm-user because we
didn't want to duplicate all the device creation stuff that DM provides.  A
simple version of that wouldn't be that hard to do, but the DM version has a
lot of features and we get that all for free.  We essentially decided to run
with DM until it gets in the way, and the only sticking point we ended up with
was that REQUEUE stuff (though not sure how that would show up with a bare
block device) and that scheduler question.

I'm going to stick with DM for now, unless it gets in the way, to avoid coming
up with a device creation scheme myself.  In the long term it's probably best
to have this be a standalone thing, but I don't want to dump a bunch of time
into putting that stuff together only to find that this isn't interesting
enough from a performance perspective to stick around.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-15  3:00     ` Palmer Dabbelt
  2020-12-16 18:24       ` Vitaly Mayatskih
@ 2020-12-22 13:32       ` Christoph Hellwig
  2020-12-22 20:31         ` Palmer Dabbelt
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2020-12-22 13:32 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: josef, bvanassche, Christoph Hellwig, snitzer, corbet,
	kernel-team, linux-doc, linux-kernel, linux-raid, song, dm-devel,
	linux-kselftest, shuah, agk, michael.christie

On Mon, Dec 14, 2020 at 07:00:57PM -0800, Palmer Dabbelt wrote:
> I haven't gotten a whole lot of feedback, so I'm inclined to at least have some
> reasonable performance numbers before bothering with a v2.

FYI, my other main worry beside duplicating nbd is that device mapper
really is a stacked interface that sits on top of other block device.
Turning this into something else that just pipes data to userspace
seems very strange.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-16 18:24       ` Vitaly Mayatskih
@ 2020-12-17  6:55         ` Palmer Dabbelt
  0 siblings, 0 replies; 8+ messages in thread
From: Palmer Dabbelt @ 2020-12-17  6:55 UTC (permalink / raw)
  To: v.mayatskih
  Cc: josef, bvanassche, Christoph Hellwig, snitzer, corbet,
	kernel-team, linux-doc, linux-kernel, linux-raid, song, dm-devel,
	linux-kselftest, shuah, agk, michael.christie

On Wed, 16 Dec 2020 10:24:59 PST (-0800), v.mayatskih@gmail.com wrote:
> On Mon, Dec 14, 2020 at 10:03 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>> I was really experting someone to say that.  It does seem kind of silly to build
>> out the new interface, but not go all the way to a ring buffer.  We just didn't
>> really have any way to justify the extra complexity as our use cases aren't
>> that high performance.   I kind of like to have benchmarks for this sort of
>> thing, though, and I didn't have anyone who had bothered avoiding the last copy
>> to compare against.
>
> I worked on something very similar, though performance was one of the
> goals. The implementation was floating around lockless ring buffers,
> shared memory for zerocopy, multiqueue and error handling. It could be
> that every disk storage vendor has to implement something like that in
> order to bridge Linux kernel to their own proprietary datapath running
> in userspace.

OK, good to know.  That's kind of the feeling I'd gotten from having chatted to
a handful of people about this, but I don't remember people having actually
gotten all the way to zero-copy.  That's how we managed to end up at this
middle-ground ABI style: when I thought people were, in practice, punting on
zero copy because the complexity just wasn't worth the performance benefit.
Maybe I'd just been colored by how my projects ended up going, but I've ended
up designing complicated interfaces in the past that allow for zero-copy only
to never get around to actually making that work.  I don't know if that's just
because I've had the good fortune to avoid working on anything that ended up
with users, though :).

For our use case I think we actually get better performance out of the
copy-based (and probably more importantly kalloc-based, but that's an
implementation thing not an ABI thing) approach: essentially we're very
sensitive to memory pressure and expect this first dm-user daemon to mostly be
idle, so we're really worried about avoiding excess memory usage while idle and
less worried about throughput when active.  This stream-based interface means
that userspace doesn't need much memory allocated to service a request, which
helps with sleep/wake latencies and/or idle memory usage.  That's also why we
have the simple locking scheme: no sense splitting locks if there's no
contention, and we only need a single thread to saturate the storage bandwidth
on these phones.

That said, it does sound like people really do care about the sort of
performance levels where zero copy is relevant in this space.  I'll take a shot
at something along those lines, and while it will add a degree of userspace
complexity I'm not sure it'll add much in the way of kernel complexity -- at
least compared to a fast version of this, where we'd need most of that stuff
anyway (obviously the malloc+single lock design is simple, but probably
wouldn't stick around for long).  At a bare minimum it'll be interesting to
play around with, but if people are doing it in practice then I'm more
confident that I can put something together that at least serves as a starting
point for further discussion.

I haven't gotten around to writing any code yet, but I had spent a bit of time
thinking about how to put this zero-copy version together and am leaning
towards it being a standalone block device (as opposed to a DM target).  I'd
avoided that before as I didn't want to mess around with my own device control
scheme so I'll still try to do the DM thing, but I'm not sure it'll be viable.
That's all speculation now, but it does bring up one interesting question:

IIUC, this version of dm-user handles BIOs before they reach the block
scheduler while a standalone driver would likely handle them after blk-mq.  I
don't have direct experience with this, but the last time I ran into people who
had these sorts of performance requirements for userspace drivers they weren't
actually trying to write userspace drivers but were instead trying to write a
userspace scheduler, with the userspace drivers just being the mechanism to
implement that scheduler.  This was a decade ago and I'm not sure that's what
people are trying to do in the new blk-mq world, but if it is then it's going
to be a major design consideration.  I'm also not entirely sure that we're
really solving the same problem at that point.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-15  3:00     ` Palmer Dabbelt
@ 2020-12-16 18:24       ` Vitaly Mayatskih
  2020-12-17  6:55         ` Palmer Dabbelt
  2020-12-22 13:32       ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Vitaly Mayatskih @ 2020-12-16 18:24 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: josef, bvanassche, Christoph Hellwig, Mike Snitzer, corbet,
	kernel-team, linux-doc, linux-kernel, linux-raid, Song Liu,
	dm-devel, linux-kselftest, shuah, agk, michael.christie

On Mon, Dec 14, 2020 at 10:03 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:

> I was really experting someone to say that.  It does seem kind of silly to build
> out the new interface, but not go all the way to a ring buffer.  We just didn't
> really have any way to justify the extra complexity as our use cases aren't
> that high performance.   I kind of like to have benchmarks for this sort of
> thing, though, and I didn't have anyone who had bothered avoiding the last copy
> to compare against.

I worked on something very similar, though performance was one of the
goals. The implementation was floating around lockless ring buffers,
shared memory for zerocopy, multiqueue and error handling. It could be
that every disk storage vendor has to implement something like that in
order to bridge Linux kernel to their own proprietary datapath running
in userspace.

-- 
wbr, Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-10 17:03   ` Josef Bacik
@ 2020-12-15  3:00     ` Palmer Dabbelt
  2020-12-16 18:24       ` Vitaly Mayatskih
  2020-12-22 13:32       ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Palmer Dabbelt @ 2020-12-15  3:00 UTC (permalink / raw)
  To: josef
  Cc: bvanassche, Christoph Hellwig, snitzer, corbet, kernel-team,
	linux-doc, linux-kernel, linux-raid, song, dm-devel,
	linux-kselftest, shuah, agk, michael.christie

On Thu, 10 Dec 2020 09:03:21 PST (-0800), josef@toxicpanda.com wrote:
> On 12/9/20 10:38 PM, Bart Van Assche wrote:
>> On 12/7/20 10:55 AM, Palmer Dabbelt wrote:
>>> All in all, I've found it a bit hard to figure out what sort of interest
>>> people
>>> have in dm-user: when I bring this up I seem to run into people who've done
>>> similar things before and are vaguely interested, but certainly nobody is
>>> chomping at the bit.  I'm sending it out in this early state to try and
>>> figure
>>> out if it's interesting enough to keep going.
>>
>> Cc-ing Josef and Mike since their nbd contributions make me wonder
>> whether this new driver could be useful to their use cases?
>>
>
> Sorry gmail+imap sucks and I can't get my email client to get at the original
> thread.  However here is my take.

and I guess I then have to apoligize for missing your email ;).  Hopefully that
was the problem, but who knows.

> 1) The advantages of using dm-user of NBD that you listed aren't actually
> problems for NBD.  We have NBD working in production where you can hand off the
> sockets for the server without ending in timeouts, it was actually the main
> reason we wrote our own server so we could use the FD transfer stuff to restart
> the server without impacting any clients that had the device in use.

OK.  So you just send the FD around using one of the standard mechanisms to
orchestrate the handoff?  I guess that might work for our use case, assuming
whatever the security side of things was doing was OK with the old FD.  TBH I'm
not sure how all that works and while we thought about doing that sort of
transfer scheme we decided to just open it again -- not sure how far we were
down the dm-user rabbit hole at that point, though, as this sort of arose out
of some other ideas.

> 2) The extra copy is a big deal, in fact we already have too many copies in our
> existing NBD setup and are actively looking for ways to avoid those.
>
> Don't take this as I don't think dm-user is a good idea, but I think at the very
> least it should start with the very best we have to offer, starting with as few
> copies as possible.

I was really experting someone to say that.  It does seem kind of silly to build
out the new interface, but not go all the way to a ring buffer.  We just didn't
really have any way to justify the extra complexity as our use cases aren't
that high performance.   I kind of like to have benchmarks for this sort of
thing, though, and I didn't have anyone who had bothered avoiding the last copy
to compare against.

> If you are using it currently in production then cool, there's clearly a usecase
> for it.  Personally as I get older and grouchier I want less things in the
> kernel, so if this enables us to eventually do everything NBD related in
> userspace with no performance drop then I'd be down.  I don't think you need to
> make that your primary goal, but at least polishing this up so it could
> potentially be abused in the future would make it more compelling for merging.
> Thanks,

Ya, it's in Android already and we'll be shipping it as part of the new OTA
flow for the next release.  The rules on deprecation are a bit different over
there, though, so it's not like we're wed to it.  The whole point of bringing
this up here was to try and get something usable by everyone, and while I'd
eventually like to get whatever's in Android into the kernel proper we'd really
planned on supporting an extra Android-only ABI for a cycle at least.  

I'm kind of inclined to take a crack at the extra copy, to at least see if
building something that eliminates it is viable.  I'm not really sure if it is
(or at least, if it'll net us a meaningful amount of performance), but it'd at
least be interesting to try.

It'd be nice to have some benchmark target, though, as otherwise this stuff
hangs on forever.  My workloads are in selftests later on in the patch set, but
I'm essentially using tmpfs as a baseline to compare against ext4+dm-user with
some FIO examples as workloads.  Our early benchmark numbers indicated this was
way faster than we needed, so I didn't even bother putting together a proper
system to run on so I don't really have any meaningful numbers there.  Is there
an NBD server that's fast that I should be comparing against?

I haven't gotten a whole lot of feedback, so I'm inclined to at least have some
reasonable performance numbers before bothering with a v2.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-10  3:38 ` [dm-devel] " Bart Van Assche
@ 2020-12-10 17:03   ` Josef Bacik
  2020-12-15  3:00     ` Palmer Dabbelt
  0 siblings, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2020-12-10 17:03 UTC (permalink / raw)
  To: Bart Van Assche, Palmer Dabbelt, Christoph Hellwig
  Cc: snitzer, corbet, kernel-team, linux-doc, linux-kernel,
	linux-raid, song, dm-devel, linux-kselftest, shuah, agk,
	Mike Christie

On 12/9/20 10:38 PM, Bart Van Assche wrote:
> On 12/7/20 10:55 AM, Palmer Dabbelt wrote:
>> All in all, I've found it a bit hard to figure out what sort of interest
>> people
>> have in dm-user: when I bring this up I seem to run into people who've done
>> similar things before and are vaguely interested, but certainly nobody is
>> chomping at the bit.  I'm sending it out in this early state to try and
>> figure
>> out if it's interesting enough to keep going.
> 
> Cc-ing Josef and Mike since their nbd contributions make me wonder
> whether this new driver could be useful to their use cases?
> 

Sorry gmail+imap sucks and I can't get my email client to get at the original 
thread.  However here is my take.

1) The advantages of using dm-user of NBD that you listed aren't actually 
problems for NBD.  We have NBD working in production where you can hand off the 
sockets for the server without ending in timeouts, it was actually the main 
reason we wrote our own server so we could use the FD transfer stuff to restart 
the server without impacting any clients that had the device in use.

2) The extra copy is a big deal, in fact we already have too many copies in our 
existing NBD setup and are actively looking for ways to avoid those.

Don't take this as I don't think dm-user is a good idea, but I think at the very 
least it should start with the very best we have to offer, starting with as few 
copies as possible.

If you are using it currently in production then cool, there's clearly a usecase 
for it.  Personally as I get older and grouchier I want less things in the 
kernel, so if this enables us to eventually do everything NBD related in 
userspace with no performance drop then I'd be down.  I don't think you need to 
make that your primary goal, but at least polishing this up so it could 
potentially be abused in the future would make it more compelling for merging. 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
  2020-12-07 18:55 Palmer Dabbelt
@ 2020-12-10  3:38 ` Bart Van Assche
  2020-12-10 17:03   ` Josef Bacik
  0 siblings, 1 reply; 8+ messages in thread
From: Bart Van Assche @ 2020-12-10  3:38 UTC (permalink / raw)
  To: Palmer Dabbelt, Christoph Hellwig
  Cc: snitzer, corbet, kernel-team, linux-doc, linux-kernel,
	linux-raid, song, dm-devel, linux-kselftest, shuah, agk,
	Josef Bacik, Mike Christie

On 12/7/20 10:55 AM, Palmer Dabbelt wrote:
> All in all, I've found it a bit hard to figure out what sort of interest
> people
> have in dm-user: when I bring this up I seem to run into people who've done
> similar things before and are vaguely interested, but certainly nobody is
> chomping at the bit.  I'm sending it out in this early state to try and
> figure
> out if it's interesting enough to keep going.

Cc-ing Josef and Mike since their nbd contributions make me wonder
whether this new driver could be useful to their use cases?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-22 20:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABmKtjfdDS-iO+jLkwt7x-oDHt9V1p-cpYHjL5EV2NKwHxqN1Q@mail.gmail.com>
2020-12-16 20:35 ` [dm-devel] [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace Palmer Dabbelt
2020-12-07 18:55 Palmer Dabbelt
2020-12-10  3:38 ` [dm-devel] " Bart Van Assche
2020-12-10 17:03   ` Josef Bacik
2020-12-15  3:00     ` Palmer Dabbelt
2020-12-16 18:24       ` Vitaly Mayatskih
2020-12-17  6:55         ` Palmer Dabbelt
2020-12-22 13:32       ` Christoph Hellwig
2020-12-22 20:31         ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).