* dm-thin: Several Questions on dm-thin performance.
@ 2019-11-22 3:14 JeffleXu
2019-11-22 18:55 ` Joe Thornber
0 siblings, 1 reply; 13+ messages in thread
From: JeffleXu @ 2019-11-22 3:14 UTC (permalink / raw)
To: dm-devel
Hi guys,
I have several questions on dm-thin when I'm testing and evaluating IO
performance of dm-thin. I would be grateful if someone could spend a
little time on it.
The first question is what's the purpose of data cell? In
thin_bio_map(), normal bio will be packed as a virtual cell and data
cell. I can understand that virtual cell is used to prevent discard bio
and non-discard bio targeting the same block from being processed at the
same time. I find it was added in commit
e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between
simultaneous io and discards to same block), but I'm still confused
about the use of data cell.
The second question is the impact of virtual cell and data cell on IO
performance. If $data_block_size is large for example 1G, in multithread
fio test, most bio will be buffered in cell->bios list and then be
processed by worker thread asynchronously, even when there's no discard
bio. Thus the original parallel IO is processed by worker thread
serially now. As the number of fio test threads increase, the single
worker thread can easily get CPU 100%, and thus become the bottleneck of
the performance since dm-thin workqueue is ordered unbound.
Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128,
numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The
bandwidth on thin device is only 1274MiB/s, while the four fio threads
run at 200% CPU and the single worker thread is always runing at 100%
CPU. perf of worker thread showes that process_bio() consumes 86% of the
time.
Regards
Jeffle Xu
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-11-22 3:14 dm-thin: Several Questions on dm-thin performance JeffleXu
@ 2019-11-22 18:55 ` Joe Thornber
2019-12-02 7:50 ` JeffleXu
2019-12-06 14:15 ` Nikos Tsironis
0 siblings, 2 replies; 13+ messages in thread
From: Joe Thornber @ 2019-11-22 18:55 UTC (permalink / raw)
To: JeffleXu; +Cc: dm-devel
On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
> The first question is what's the purpose of data cell? In thin_bio_map(),
> normal bio will be packed as a virtual cell and data cell. I can understand
> that virtual cell is used to prevent discard bio and non-discard bio
> targeting the same block from being processed at the same time. I find it
> was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
> fix race between simultaneous io and discards to same block), but I'm still
> confused about the use of data cell.
As you are aware there are two address spaces for the locks. The 'virtual' one
refers to cells in the logical address space of the thin devices, and the 'data' one
refers to the underlying data device. There are certain conditions where we
unfortunately need to hold both of these (eg, to prevent a data block being reprovisioned
before an io to it has completed).
> The second question is the impact of virtual cell and data cell on IO
> performance. If $data_block_size is large for example 1G, in multithread fio
> test, most bio will be buffered in cell->bios list and then be processed by
> worker thread asynchronously, even when there's no discard bio. Thus the
> original parallel IO is processed by worker thread serially now. As the
> number of fio test threads increase, the single worker thread can easily get
> CPU 100%, and thus become the bottleneck of the performance since dm-thin
> workqueue is ordered unbound.
Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
new interface that we need to move dm-thin across to use (dm-cache already uses it).
It allows concurrent holders of a cell (ie, read locks), so we'll be able to remap
much more io without handing it off to a worker thread. Once this is done I want
to add an extra field to cells that will cache the mapping, this way if you acquire a
cell that is already held then you can avoid the expensive btree lookup. Together
these changes should make a huge difference to the performance.
If you've got some spare coding cycles I'd love some help with this ;)
- Joe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-11-22 18:55 ` Joe Thornber
@ 2019-12-02 7:50 ` JeffleXu
2019-12-02 22:26 ` Eric Wheeler
2019-12-06 14:15 ` Nikos Tsironis
1 sibling, 1 reply; 13+ messages in thread
From: JeffleXu @ 2019-12-02 7:50 UTC (permalink / raw)
To: dm-devel
Thanks for replying and explanation.
Anyway, it seems a significant workload to transform to bio-prison-cell-v2.
Regards
Jeffle
在 2019/11/23 上午2:55, Joe Thornber 写道:
> On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
>
>> The first question is what's the purpose of data cell? In thin_bio_map(),
>> normal bio will be packed as a virtual cell and data cell. I can understand
>> that virtual cell is used to prevent discard bio and non-discard bio
>> targeting the same block from being processed at the same time. I find it
>> was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
>> fix race between simultaneous io and discards to same block), but I'm still
>> confused about the use of data cell.
> As you are aware there are two address spaces for the locks. The 'virtual' one
> refers to cells in the logical address space of the thin devices, and the 'data' one
> refers to the underlying data device. There are certain conditions where we
> unfortunately need to hold both of these (eg, to prevent a data block being reprovisioned
> before an io to it has completed).
>
>> The second question is the impact of virtual cell and data cell on IO
>> performance. If $data_block_size is large for example 1G, in multithread fio
>> test, most bio will be buffered in cell->bios list and then be processed by
>> worker thread asynchronously, even when there's no discard bio. Thus the
>> original parallel IO is processed by worker thread serially now. As the
>> number of fio test threads increase, the single worker thread can easily get
>> CPU 100%, and thus become the bottleneck of the performance since dm-thin
>> workqueue is ordered unbound.
> Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
> new interface that we need to move dm-thin across to use (dm-cache already uses it).
> It allows concurrent holders of a cell (ie, read locks), so we'll be able to remap
> much more io without handing it off to a worker thread. Once this is done I want
> to add an extra field to cells that will cache the mapping, this way if you acquire a
> cell that is already held then you can avoid the expensive btree lookup. Together
> these changes should make a huge difference to the performance.
>
> If you've got some spare coding cycles I'd love some help with this ;)
>
> - Joe
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-02 7:50 ` JeffleXu
@ 2019-12-02 22:26 ` Eric Wheeler
2019-12-03 12:51 ` Joe Thornber
0 siblings, 1 reply; 13+ messages in thread
From: Eric Wheeler @ 2019-12-02 22:26 UTC (permalink / raw)
To: Joe Thornber, JeffleXu; +Cc: dm-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4267 bytes --]
On Mon, 2 Dec 2019, JeffleXu wrote:
> 在 2019/11/23 上午2:55, Joe Thornber 写道:
>
> > On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
> >
> > > The first question is what's the purpose of data cell? In thin_bio_map(),
> > > normal bio will be packed as a virtual cell and data cell. I can
> > > understand
> > > that virtual cell is used to prevent discard bio and non-discard bio
> > > targeting the same block from being processed at the same time. I find it
> > > was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
> > > fix race between simultaneous io and discards to same block), but I'm
> > > still
> > > confused about the use of data cell.
> > As you are aware there are two address spaces for the locks. The 'virtual'
> > one
> > refers to cells in the logical address space of the thin devices, and the
> > 'data' one
> > refers to the underlying data device. There are certain conditions where we
> > unfortunately need to hold both of these (eg, to prevent a data block being
> > reprovisioned
> > before an io to it has completed).
> >
> > > The second question is the impact of virtual cell and data cell on IO
> > > performance. If $data_block_size is large for example 1G, in multithread
> > > fio
> > > test, most bio will be buffered in cell->bios list and then be processed
> > > by
> > > worker thread asynchronously, even when there's no discard bio. Thus the
> > > original parallel IO is processed by worker thread serially now. As the
> > > number of fio test threads increase, the single worker thread can easily
> > > get
> > > CPU 100%, and thus become the bottleneck of the performance since dm-thin
> > > workqueue is ordered unbound.
> > Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
> > new interface that we need to move dm-thin across to use (dm-cache already
> > uses it).
> > It allows concurrent holders of a cell (ie, read locks), so we'll be able to
> > remap
> > much more io without handing it off to a worker thread. Once this is done I
> > want
> > to add an extra field to cells that will cache the mapping, this way if you
> > acquire a
> > cell that is already held then you can avoid the expensive btree lookup.
> > Together
> > these changes should make a huge difference to the performance.
> >
> > If you've got some spare coding cycles I'd love some help with this ;)
Hi Joe,
I'm not sure if I will have the time but thought I would start the
research and ask a few questions. I looked at the v1/v2 .h files and some
of the functions just change suffix to _v2 and maybe calling
convention/structure field changes.
However, there appear to be some design changes, too:
* dm_deferred_set - These appear to be used a bit in dm-thin.c.
The dm_deferred_set calls don't seem to reference anything prison-related,
but they are defined in dm-bio-prison-v1.h. Can you provide direction on
how these would be refactored, or if they can just remain as-is?
Call counts in dm-thin.c:
2 dm_deferred_entry_dec
2 dm_deferred_set_create
3 dm_deferred_entry_inc
3 dm_deferred_set_add_work
4 dm_deferred_set_destroy
* dm_bio_detain - is this replaced by dm_cell_get_v2?
- It looks like dm_bio_detain() returns 1 if already held, but
dm_cell_get_v2() returns true if the lock is granted. How might
this be handled?
- What are the lock_levels?
- What in dm-thin.c would then call dm_cell_put_v2?
* dm_cell_release(_no_holder) - is this replaced by dm_cell_unlock_v2?
- How would the _no_holder version be refactored?
* dm_cell_visit_release - This function uses a callback, but none of the
v2 functions have such a callback. Do we need to write a helper function
with get/unlock(?) around the cell?
* dm_cell_error - no equivalent v2 implementation. Suggestions?
What other considerations might there be in the v2 port?
Thanks!
--
Eric Wheeler
> >
> > - Joe
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-02 22:26 ` Eric Wheeler
@ 2019-12-03 12:51 ` Joe Thornber
2019-12-04 12:11 ` Joe Thornber
0 siblings, 1 reply; 13+ messages in thread
From: Joe Thornber @ 2019-12-03 12:51 UTC (permalink / raw)
To: Eric Wheeler; +Cc: JeffleXu, dm-devel
On Mon, Dec 02, 2019 at 10:26:00PM +0000, Eric Wheeler wrote:
> Hi Joe,
>
> I'm not sure if I will have the time but thought I would start the
> research and ask a few questions. I looked at the v1/v2 .h files and some
> of the functions just change suffix to _v2 and maybe calling
> convention/structure field changes.
>
> However, there appear to be some design changes, too:
Yes, the interface is different, and it's really not trivial to switch dm-thin
over to use it (otherwise I'd have done it already). dm-cache already uses
the new interface which could be used as a guide, especially if you look at the patches
that made the switch.
I'm going to write up some notes over the next couple of days. Which I'll post on this thread.
- Joe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-03 12:51 ` Joe Thornber
@ 2019-12-04 12:11 ` Joe Thornber
2019-12-05 23:14 ` Eric Wheeler
0 siblings, 1 reply; 13+ messages in thread
From: Joe Thornber @ 2019-12-04 12:11 UTC (permalink / raw)
To: Eric Wheeler, JeffleXu, dm-devel
(These notes are for my own benefit as much as anything, I haven't
worked on this for a couple of years and will forget it all completely
if I don't write it down somewhere).
Let's start by writing some pseudocode for what the remap function for
thin provisioning actually does.
----------------------------------------------------------
-- Metadata
newtype ThinId = Int
data Bio = Bio
newtype VBlock = Integer -- virtual block
newtype DBlock = Integer -- data block
data LookupResult =
Unprovisioned |
Provisioned { lrDBlock :: DataBlock,
lrShared :: Bool
}
metadataLookup :: ThinId -> VBlock -> Task LookupResult
metadataLookup = undefined
metadataInsert :: ThinId -> VBlock -> DBlock -> Task ()
metadataInsert = undefined
metadataRemove :: ThinId -> VBlock -> Task ()
metadataRemove = undefined
-- Blocks all other metadata operations while running
metadataCommit :: Task ()
metadataCommit = undefined
----------------------------------------------------------
-- Tasks
-- Many steps of servicing a bio can block. eg, taking a lock,
-- reading metadata, updating metadata, zeroing data, copying
-- data ...
-- So we completely side step the issue in this pseudocode by
-- running everything in a magic light weight thread.
spawn :: Task () -> IO ()
spawn = undefined
----------------------------------------------------------
-- Locking
-- These 'with' primitives acquire a lock (can block of course), perform
-- an action, and then automatically release the lock.
-- Shared lock can be upgraded, so we need to pass the lock into
-- the action body.
withSharedLock :: ThinId -> VBlock -> (Lock -> Task ()) -> Task ()
withSharedLock thinId vblock actionFn = undefined
withExclusiveLock :: ThinId -> VBlock -> Task () -> Task ()
withExclusiveLock thinId vblock action = undefined
-- This promotes a shared lock to exclusive.
withUpgradedLock :: Lock -> Task () -> Task ()
withUpgradedLock lock action = undefined
-- Data locks are always exclusive
withDataLock :: DBlock -> Task () -> Task ()
withDataLock dblock action = undefined
----------------------------------------------------------
-- Top level remap function. Kicks off a green thread for each bio.
-- How we handle a bio depends on whether it's a read, write, discard
-- or flush bio. Whether the block is already provisioned, and if so
-- whether it is shared between snapshots.
remap :: ThinId -> Bio -> IO ()
remap thinId bio = spawn $ remapFn thinId bio vblock
where
vblock = virtBlock bio
remapFn = case classifyBio bio of
ReadBio -> remapRead
WriteBio -> remapWrite
DiscardBio -> remapDiscard
FlushBio -> remapFlush
----------------------------------------------------------
remapRead :: ThinId -> Bio -> VBlock -> Task ()
remapRead thinId bio vblock = do
withSharedLock thinId vblock $ \_ -> do
lr <- metadataLookup thinId vblock
case lr of
-- Read, Unprovisioned, Shared/!Shared
Unprovisioned -> do
fillWithZeroes bio
complete bio Success
-- Read, Provisioned, Shared/!Shared
(Provisioned dblock _) ->
remapAndWait bio dblock
----------------------------------------------------------
remapWrite :: ThinId -> Bio -> VBlock -> Task ()
remapWrite thinId bio vblock = do
withSharedLock thinId vblock $ \lock -> do
lr <- metadataLookup thinId vblock
case lr of
-- Write, Unprovisioned
Unprovisioned ->
withUpgradedLock lock $
provision thinId bio vblock
-- Write, Provisioned, !Shared
(Provisioned dblock False) ->
remapAndWait bio dblock
-- Write, Provisioned, Shared
(Provisioned dblock True) ->
withUpgradedLock lock $
breakSharing thinId bio vblock dblock
breakSharing :: ThinId -> Bio -> VBlock -> DataBlock -> Task ()
breakSharing thinId bio vblock dblockOld = do
ab <- allocateBlock
case ab of
NoDataSpace ->
complete bio Failure
(Allocated dblockNew) ->
withDataLock dblockOld $ -- we grab data locks to avoid races with discard
withDataLock dblockNew $ do
copy dblockOld dblockNew
metadataInsert thinId vblock dblockNew
remapAndWait thinId bio dblockNew
provision :: ThinId -> Bio -> VBlock -> Task ()
provision thinId bio vblock = do
case allocateBlock of
NoDataSpace ->
complete bio Failure
(Allocated dblock) ->
withDataLock dblock $ do
metadataInsert thinId vblock dblock
remapAndWait thinId bio dblock
----------------------------------------------------------
discard :: ThinId -> Bio -> VBlock -> Task ()
discard thinId bio vblock = do
withExclusiveLock thinId vblock $ do
lr <- metadataLookup thinId vblock
case lr of
-- Discard, Unprovisioned
Unprovisioned ->
complete bio Success
-- Discard, Provisioned, !Shared
(Provisioned dblock False) ->
withDataLock dblock $ do
remapAndWait bio dblock -- passdown
metadataRemove thinId dblock
-- Discard, Provisioned, Shared
(Provisioned dblock True) ->
withDataLock dblock $ do
metadataRemove thinId dblock
complete bio Success
----------------------------------------------------------
flush :: Task ()
flush = metadataCommit
----------------------------------------------------------
remapAndWait :: Bio -> DataBlock -> Task ()
remapAndWait bio dblock = do
remap bio dblock
issue bio
wait bio
The above is a simplification (eg, discards can cover more than a single
block, the pool has multiple modes like OUT_OF_DATA_SPACE). But it gives
a good idea of what the dm target needs to do, and in a succinct manner.
Now dm-thin.c is anything but succinct, for a couple of reasons:
- Because of where we are in the IO stack we cannot allocate memory.
This means we either use memory preallocated via mempools, or allocate
a fixed size block before a bio is processed.
- We don't have a magic green threads library that hides the numerous
blocking operations that we need. Instead we have low level facilities
like workqueues etc. This tends to have the effect of breaking up the logic
and scattering it across lots of little completion functions.
How we handle blocking, locking, and quiescing IO are all intertwined.
Which is why switching over to the new bio_prison interface is going to
involve an awful lot of churn.
In the upstream code
====================
- Locking
The locks provided by bio_prison (v1), are all exclusive locks. As such
we take pains to hold them for as short a period as possible. This means
holding them for the duration of an IO is completely out of the question.
Nonetheless, as pointed out in the original post for this thread, this
can cause bad lock contention, especially if the data block size is large.
- Quiescing
Because we do not hold locks for the lifetime of the bios, we need
another way of tracking IO and quiescing regions. This is what the
deferred_set component does. Effectively it divides time up into
bins, and keeps a reference count of how many IOs are still in flight
for each bin. To quiesce we grab a lock, and then wait for all bins
before this lock was acquired to drain. Advantages of this approach
is it uses very little memory (I think we're currently running with
64 bins), and consumes v. little cpu. But we're never specific about
which region we're waiting to quiesce, instead always waiting for all
IO older than a certain point to drain. So we are certainly introducing
more latency here.
- Blocking
A single thread services any operations that could block.
When there is work for this thread to perform a work queue item
is queued (do_worker()). This then examines linked lists of work
(prepared_mappings, discard_pt1, discard_pt2, prefetches etc), and
processes each list as a batch. Batching like this is a mixed blessing;
it allows us to sort incoming bios so we can process bios to the same
region at the same time, but it is also v. bad for max latency, as we
have no idea which piece of work was there the longest.
Next iteration of the code
==========================
- Locking
bio_prison (v2) provides shared locks, and custom lock levels. So,
at the expense of memory, we can hold shared locks for long periods
that cover the lifetime of the bio. Acquiring a lock now blocks.
- Quiescing
Because we hold the locks for long periods we can now do away with the
deferred set completely. If you want to quiesce a region, just grab
the exclusive lock associated with it, when it's finally granted you
know it's also quiesced.
- Blocking
I want to move away from the idea of a single worker function that
has different categories of work stored for it in different lists.
Instead, storing specific work structs on the work queue for each bio.
Partly this is to reduce latency by increasing 'fairness'. But also
the fact that acquiring a lock now blocks means there are a lot more
block operations to handle, and we'd just end up with a lot of these
lists of work. It would also allow us to have multiple kernel threads
servicing the workqueue.
If you look at dm-cache-target.c you'll see this has already been
done for that target. We have continuation structs that represent
the work to be performed after the current blocking op has completed.
dm-cache uses this for migrations, which have a much simpler state model
than dm-thin. Even so there are a lot of these little continuation
functions (eg, mg_start, mg_lock_writes, mg_copy, mg_full_copy,
mg_upgrade_lock, mg_update_metadata_after_copy, mg_update_metadata,
mg_success, mg_complete).
Where are we now?
=================
I did a lot of work on this a couple of years ago. First I just dove
in, trying to code things up by hand. But it quickly devolved into a
maze of badly named continuation functions, all alike. It's very hard
to give these functions meaningful names; go through the pseudocode at
the top of this email and for each place where we could block, try to
describe where we are. The biggest problem is as we introduce more of
these continuations big picture logic receeds and it becomes v. hard to
reason about the code.
I then experimented with automatically generating all the code from a
simpler specification (I used a lispy version of the pseudocode above).
This showed promise and I got it generating kernel code that would
compile. I was debugging this when I got dragged onto other work,
and this has stagnated since.
So that's where we are.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-04 12:11 ` Joe Thornber
@ 2019-12-05 23:14 ` Eric Wheeler
0 siblings, 0 replies; 13+ messages in thread
From: Eric Wheeler @ 2019-12-05 23:14 UTC (permalink / raw)
To: Joe Thornber; +Cc: JeffleXu, dm-devel
Thanks Joe, great writeup. Maybe this should go in thin-provisioning.txt.
More below:
On Wed, 4 Dec 2019, Joe Thornber wrote:
> (These notes are for my own benefit as much as anything, I haven't
> worked on this for a couple of years and will forget it all completely
> if I don't write it down somewhere).
>
> Let's start by writing some pseudocode for what the remap function for
> thin provisioning actually does.
>
> ----------------------------------------------------------
> -- Metadata
>
> newtype ThinId = Int
> data Bio = Bio
> newtype VBlock = Integer -- virtual block
> newtype DBlock = Integer -- data block
Can you define virtual block vs data block? Is this just the thin volume
offset vs the tdata offset?
>
> data LookupResult =
> Unprovisioned |
> Provisioned { lrDBlock :: DataBlock,
> lrShared :: Bool
> }
What is "lr"? as in lrDBlock?
> metadataLookup :: ThinId -> VBlock -> Task LookupResult
> metadataLookup = undefined
>
> metadataInsert :: ThinId -> VBlock -> DBlock -> Task ()
> metadataInsert = undefined
>
> metadataRemove :: ThinId -> VBlock -> Task ()
> metadataRemove = undefined
>
> -- Blocks all other metadata operations while running
> metadataCommit :: Task ()
> metadataCommit = undefined
>
> ----------------------------------------------------------
> -- Tasks
>
> -- Many steps of servicing a bio can block. eg, taking a lock,
> -- reading metadata, updating metadata, zeroing data, copying
> -- data ...
> -- So we completely side step the issue in this pseudocode by
> -- running everything in a magic light weight thread.
> spawn :: Task () -> IO ()
> spawn = undefined
>
> ----------------------------------------------------------
> -- Locking
>
> -- These 'with' primitives acquire a lock (can block of course), perform
> -- an action, and then automatically release the lock.
>
> -- Shared lock can be upgraded, so we need to pass the lock into
> -- the action body.
> withSharedLock :: ThinId -> VBlock -> (Lock -> Task ()) -> Task ()
> withSharedLock thinId vblock actionFn = undefined
>
> withExclusiveLock :: ThinId -> VBlock -> Task () -> Task ()
> withExclusiveLock thinId vblock action = undefined
>
> -- This promotes a shared lock to exclusive.
> withUpgradedLock :: Lock -> Task () -> Task ()
> withUpgradedLock lock action = undefined
>
> -- Data locks are always exclusive
> withDataLock :: DBlock -> Task () -> Task ()
> withDataLock dblock action = undefined
>
> ----------------------------------------------------------
>
> -- Top level remap function. Kicks off a green thread for each bio.
> -- How we handle a bio depends on whether it's a read, write, discard
> -- or flush bio. Whether the block is already provisioned, and if so
> -- whether it is shared between snapshots.
> remap :: ThinId -> Bio -> IO ()
> remap thinId bio = spawn $ remapFn thinId bio vblock
> where
> vblock = virtBlock bio
> remapFn = case classifyBio bio of
> ReadBio -> remapRead
> WriteBio -> remapWrite
> DiscardBio -> remapDiscard
> FlushBio -> remapFlush
>
> ----------------------------------------------------------
>
> remapRead :: ThinId -> Bio -> VBlock -> Task ()
> remapRead thinId bio vblock = do
> withSharedLock thinId vblock $ \_ -> do
> lr <- metadataLookup thinId vblock
> case lr of
> -- Read, Unprovisioned, Shared/!Shared
> Unprovisioned -> do
> fillWithZeroes bio
> complete bio Success
>
> -- Read, Provisioned, Shared/!Shared
> (Provisioned dblock _) ->
> remapAndWait bio dblock
>
> ----------------------------------------------------------
>
> remapWrite :: ThinId -> Bio -> VBlock -> Task ()
> remapWrite thinId bio vblock = do
> withSharedLock thinId vblock $ \lock -> do
> lr <- metadataLookup thinId vblock
> case lr of
> -- Write, Unprovisioned
> Unprovisioned ->
> withUpgradedLock lock $
> provision thinId bio vblock
>
> -- Write, Provisioned, !Shared
> (Provisioned dblock False) ->
> remapAndWait bio dblock
>
> -- Write, Provisioned, Shared
> (Provisioned dblock True) ->
> withUpgradedLock lock $
> breakSharing thinId bio vblock dblock
>
> breakSharing :: ThinId -> Bio -> VBlock -> DataBlock -> Task ()
> breakSharing thinId bio vblock dblockOld = do
> ab <- allocateBlock
> case ab of
> NoDataSpace ->
> complete bio Failure
>
> (Allocated dblockNew) ->
> withDataLock dblockOld $ -- we grab data locks to avoid races with discard
> withDataLock dblockNew $ do
> copy dblockOld dblockNew
> metadataInsert thinId vblock dblockNew
> remapAndWait thinId bio dblockNew
>
> provision :: ThinId -> Bio -> VBlock -> Task ()
> provision thinId bio vblock = do
> case allocateBlock of
> NoDataSpace ->
> complete bio Failure
>
> (Allocated dblock) ->
> withDataLock dblock $ do
> metadataInsert thinId vblock dblock
> remapAndWait thinId bio dblock
Does the allocator block? If so, it would be neat to pre-allocate some
number of blocks during metadata idle times. There could be a hidden thin
volume (ie, devid #16777215) that blocks are allocated into and then
stolen from for use elsewhere. The blocks could be pre-zeroed, too!
>
> ----------------------------------------------------------
>
> discard :: ThinId -> Bio -> VBlock -> Task ()
> discard thinId bio vblock = do
> withExclusiveLock thinId vblock $ do
> lr <- metadataLookup thinId vblock
> case lr of
> -- Discard, Unprovisioned
> Unprovisioned ->
> complete bio Success
>
> -- Discard, Provisioned, !Shared
> (Provisioned dblock False) ->
> withDataLock dblock $ do
> remapAndWait bio dblock -- passdown
> metadataRemove thinId dblock
>
> -- Discard, Provisioned, Shared
> (Provisioned dblock True) ->
> withDataLock dblock $ do
> metadataRemove thinId dblock
> complete bio Success
>
> ----------------------------------------------------------
>
> flush :: Task ()
> flush = metadataCommit
>
> ----------------------------------------------------------
>
> remapAndWait :: Bio -> DataBlock -> Task ()
> remapAndWait bio dblock = do
> remap bio dblock
> issue bio
> wait bio
>
> The above is a simplification (eg, discards can cover more than a single
> block, the pool has multiple modes like OUT_OF_DATA_SPACE). But it gives
> a good idea of what the dm target needs to do, and in a succinct manner.
>
> Now dm-thin.c is anything but succinct, for a couple of reasons:
>
> - Because of where we are in the IO stack we cannot allocate memory.
> This means we either use memory preallocated via mempools, or allocate
> a fixed size block before a bio is processed.
>
> - We don't have a magic green threads library that hides the numerous
> blocking operations that we need. Instead we have low level facilities
> like workqueues etc. This tends to have the effect of breaking up the logic
> and scattering it across lots of little completion functions.
>
>
> How we handle blocking, locking, and quiescing IO are all intertwined.
> Which is why switching over to the new bio_prison interface is going to
> involve an awful lot of churn.
>
> In the upstream code
> ====================
>
> - Locking
>
> The locks provided by bio_prison (v1), are all exclusive locks. As such
> we take pains to hold them for as short a period as possible. This means
> holding them for the duration of an IO is completely out of the question.
> Nonetheless, as pointed out in the original post for this thread, this
> can cause bad lock contention, especially if the data block size is large.
>
> - Quiescing
>
> Because we do not hold locks for the lifetime of the bios, we need
> another way of tracking IO and quiescing regions. This is what the
> deferred_set component does. Effectively it divides time up into
> bins, and keeps a reference count of how many IOs are still in flight
> for each bin. To quiesce we grab a lock, and then wait for all bins
> before this lock was acquired to drain. Advantages of this approach
> is it uses very little memory (I think we're currently running with
> 64 bins), and consumes v. little cpu. But we're never specific about
curious, is "v." short for "very" (not "versus")?
> which region we're waiting to quiesce, instead always waiting for all
> IO older than a certain point to drain. So we are certainly introducing
> more latency here.
>
> - Blocking
>
> A single thread services any operations that could block.
> When there is work for this thread to perform a work queue item
> is queued (do_worker()). This then examines linked lists of work
> (prepared_mappings, discard_pt1, discard_pt2, prefetches etc), and
> processes each list as a batch. Batching like this is a mixed blessing;
> it allows us to sort incoming bios so we can process bios to the same
> region at the same time, but it is also v. bad for max latency, as we
> have no idea which piece of work was there the longest.
>
> Next iteration of the code
> ==========================
>
> - Locking
>
> bio_prison (v2) provides shared locks, and custom lock levels. So,
> at the expense of memory, we can hold shared locks for long periods
> that cover the lifetime of the bio. Acquiring a lock now blocks.
>
> - Quiescing
>
> Because we hold the locks for long periods we can now do away with the
> deferred set completely. If you want to quiesce a region, just grab
> the exclusive lock associated with it, when it's finally granted you
> know it's also quiesced.
>
good to know.
> - Blocking
>
> I want to move away from the idea of a single worker function that
> has different categories of work stored for it in different lists.
> Instead, storing specific work structs on the work queue for each bio.
> Partly this is to reduce latency by increasing 'fairness'. But also
> the fact that acquiring a lock now blocks means there are a lot more
> block operations to handle, and we'd just end up with a lot of these
> lists of work. It would also allow us to have multiple kernel threads
> servicing the workqueue.
>
> If you look at dm-cache-target.c you'll see this has already been
> done for that target. We have continuation structs that represent
> the work to be performed after the current blocking op has completed.
> dm-cache uses this for migrations, which have a much simpler state model
> than dm-thin. Even so there are a lot of these little continuation
> functions (eg, mg_start, mg_lock_writes, mg_copy, mg_full_copy,
> mg_upgrade_lock, mg_update_metadata_after_copy, mg_update_metadata,
> mg_success, mg_complete).
>
>
> Where are we now?
> =================
>
> I did a lot of work on this a couple of years ago. First I just dove
> in, trying to code things up by hand. But it quickly devolved into a
> maze of badly named continuation functions, all alike. It's very hard
> to give these functions meaningful names; go through the pseudocode at
> the top of this email and for each place where we could block, try to
> describe where we are. The biggest problem is as we introduce more of
> these continuations big picture logic receeds and it becomes v. hard to
> reason about the code.
Event-driven continuation functions seem to pop up frequently in the Linux
kernel. It would be neat if there was a framework to write these
procedurally. Macros might help, could still be pretty ugly. Almost
needs GCC support.
> I then experimented with automatically generating all the code from a
> simpler specification (I used a lispy version of the pseudocode above).
> This showed promise and I got it generating kernel code that would
> compile. I was debugging this when I got dragged onto other work,
> and this has stagnated since.
Do you think this is the best way to proceed? Someone with Lisp
background would need to help. It might generate first-pass code but would
be difficult to maintain as kernel changes patch the auto-generated code.
-Eric
>
>
> So that's where we are.
>
>
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-11-22 18:55 ` Joe Thornber
2019-12-02 7:50 ` JeffleXu
@ 2019-12-06 14:15 ` Nikos Tsironis
2019-12-13 1:32 ` Eric Wheeler
1 sibling, 1 reply; 13+ messages in thread
From: Nikos Tsironis @ 2019-12-06 14:15 UTC (permalink / raw)
To: thornber; +Cc: JeffleXu, dm-devel
On 11/22/19 8:55 PM, Joe Thornber wrote:
> On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
>
>> The first question is what's the purpose of data cell? In thin_bio_map(),
>> normal bio will be packed as a virtual cell and data cell. I can understand
>> that virtual cell is used to prevent discard bio and non-discard bio
>> targeting the same block from being processed at the same time. I find it
>> was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
>> fix race between simultaneous io and discards to same block), but I'm still
>> confused about the use of data cell.
>
> As you are aware there are two address spaces for the locks. The 'virtual' one
> refers to cells in the logical address space of the thin devices, and the 'data' one
> refers to the underlying data device. There are certain conditions where we
> unfortunately need to hold both of these (eg, to prevent a data block being reprovisioned
> before an io to it has completed).
>
>> The second question is the impact of virtual cell and data cell on IO
>> performance. If $data_block_size is large for example 1G, in multithread fio
>> test, most bio will be buffered in cell->bios list and then be processed by
>> worker thread asynchronously, even when there's no discard bio. Thus the
>> original parallel IO is processed by worker thread serially now. As the
>> number of fio test threads increase, the single worker thread can easily get
>> CPU 100%, and thus become the bottleneck of the performance since dm-thin
>> workqueue is ordered unbound.
>
> Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
> new interface that we need to move dm-thin across to use (dm-cache already uses it).
> It allows concurrent holders of a cell (ie, read locks), so we'll be able to remap
> much more io without handing it off to a worker thread. Once this is done I want
> to add an extra field to cells that will cache the mapping, this way if you acquire a
> cell that is already held then you can avoid the expensive btree lookup. Together
> these changes should make a huge difference to the performance.
>
> If you've got some spare coding cycles I'd love some help with this ;)
>
Hi Joe,
I would be interested in helping you with this task. I can't make any
promises, but I believe I could probably spare some time to work on it.
If you think you could use the extra help, let me know.
Nikos
> - Joe
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-06 14:15 ` Nikos Tsironis
@ 2019-12-13 1:32 ` Eric Wheeler
2019-12-15 21:44 ` Eric Wheeler
0 siblings, 1 reply; 13+ messages in thread
From: Eric Wheeler @ 2019-12-13 1:32 UTC (permalink / raw)
To: Nikos Tsironis; +Cc: JeffleXu, dm-devel, thornber
On Fri, 6 Dec 2019, Nikos Tsironis wrote:
> On 11/22/19 8:55 PM, Joe Thornber wrote:
> > On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
> >
> > > The first question is what's the purpose of data cell? In thin_bio_map(),
> > > normal bio will be packed as a virtual cell and data cell. I can
> > > understand
> > > that virtual cell is used to prevent discard bio and non-discard bio
> > > targeting the same block from being processed at the same time. I find it
> > > was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
> > > fix race between simultaneous io and discards to same block), but I'm
> > > still
> > > confused about the use of data cell.
> >
> > As you are aware there are two address spaces for the locks. The 'virtual'
> > one
> > refers to cells in the logical address space of the thin devices, and the
> > 'data' one
> > refers to the underlying data device. There are certain conditions where we
> > unfortunately need to hold both of these (eg, to prevent a data block being
> > reprovisioned
> > before an io to it has completed).
> >
> > > The second question is the impact of virtual cell and data cell on IO
> > > performance. If $data_block_size is large for example 1G, in multithread
> > > fio
> > > test, most bio will be buffered in cell->bios list and then be processed
> > > by
> > > worker thread asynchronously, even when there's no discard bio. Thus the
> > > original parallel IO is processed by worker thread serially now. As the
> > > number of fio test threads increase, the single worker thread can easily
> > > get
> > > CPU 100%, and thus become the bottleneck of the performance since dm-thin
> > > workqueue is ordered unbound.
> >
> > Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
> > new interface that we need to move dm-thin across to use (dm-cache already
> > uses it).
> > It allows concurrent holders of a cell (ie, read locks), so we'll be able to
> > remap
> > much more io without handing it off to a worker thread. Once this is done I
> > want
> > to add an extra field to cells that will cache the mapping, this way if you
> > acquire a
> > cell that is already held then you can avoid the expensive btree lookup.
> > Together
> > these changes should make a huge difference to the performance.
> >
> > If you've got some spare coding cycles I'd love some help with this ;)
> >
>
> Hi Joe,
>
> I would be interested in helping you with this task. I can't make any
> promises, but I believe I could probably spare some time to work on it.
Hi Nikos, it would be great if you are able help with the
dm-thin port to dm-bio-prison-v2. I'm glad to see you are interested in
dm-thin performance too.
These are the commits that implemented dm-bio-prison-v2 in dm-cache back
in ~4.12, maybe it can give you a good start on what the conversion might
look like:
b29d4986d dm cache: significant rework to leverage dm-bio-prison-v2
Here's a related bugfix:
d1260e2a3 dm cache: fix race condition in the writeback mode overwrite_bio optimisation
--
Eric Wheeler
>
> Nikos
>
> > - Joe
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-13 1:32 ` Eric Wheeler
@ 2019-12-15 21:44 ` Eric Wheeler
2019-12-18 13:13 ` Joe Thornber
0 siblings, 1 reply; 13+ messages in thread
From: Eric Wheeler @ 2019-12-15 21:44 UTC (permalink / raw)
To: thornber; +Cc: JeffleXu, dm-devel, Nikos Tsironis
On Fri, 13 Dec 2019, Eric Wheeler wrote:
> On Fri, 6 Dec 2019, Nikos Tsironis wrote:
> > On 11/22/19 8:55 PM, Joe Thornber wrote:
> > > On Fri, Nov 22, 2019 at 11:14:15AM +0800, JeffleXu wrote:
> > >
> > > > The first question is what's the purpose of data cell? In thin_bio_map(),
> > > > normal bio will be packed as a virtual cell and data cell. I can
> > > > understand
> > > > that virtual cell is used to prevent discard bio and non-discard bio
> > > > targeting the same block from being processed at the same time. I find it
> > > > was added in commit e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin:
> > > > fix race between simultaneous io and discards to same block), but I'm
> > > > still
> > > > confused about the use of data cell.
> > >
> > > As you are aware there are two address spaces for the locks. The 'virtual'
> > > one
> > > refers to cells in the logical address space of the thin devices, and the
> > > 'data' one
> > > refers to the underlying data device. There are certain conditions where we
> > > unfortunately need to hold both of these (eg, to prevent a data block being
> > > reprovisioned
> > > before an io to it has completed).
> > >
> > > > The second question is the impact of virtual cell and data cell on IO
> > > > performance. If $data_block_size is large for example 1G, in multithread
> > > > fio
> > > > test, most bio will be buffered in cell->bios list and then be processed
> > > > by
> > > > worker thread asynchronously, even when there's no discard bio. Thus the
> > > > original parallel IO is processed by worker thread serially now. As the
> > > > number of fio test threads increase, the single worker thread can easily
> > > > get
> > > > CPU 100%, and thus become the bottleneck of the performance since dm-thin
> > > > workqueue is ordered unbound.
> > >
> > > Yep, this is a big issue. Take a look at dm-bio-prison-v2.h, this is the
> > > new interface that we need to move dm-thin across to use (dm-cache already
> > > uses it).
> > > It allows concurrent holders of a cell (ie, read locks), so we'll be able to
> > > remap
> > > much more io without handing it off to a worker thread. Once this is done I
> > > want
> > > to add an extra field to cells that will cache the mapping, this way if you
> > > acquire a
> > > cell that is already held then you can avoid the expensive btree lookup.
> > > Together
> > > these changes should make a huge difference to the performance.
> > >
> > > If you've got some spare coding cycles I'd love some help with this ;)
> > >
> >
> > Hi Joe,
> >
> > I would be interested in helping you with this task. I can't make any
> > promises, but I believe I could probably spare some time to work on it.
>
>
> Hi Nikos, it would be great if you are able help with the
> dm-thin port to dm-bio-prison-v2. I'm glad to see you are interested in
> dm-thin performance too.
>
> These are the commits that implemented dm-bio-prison-v2 in dm-cache back
> in ~4.12, maybe it can give you a good start on what the conversion might
> look like:
>
> b29d4986d dm cache: significant rework to leverage dm-bio-prison-v2
>
> Here's a related bugfix:
>
> d1260e2a3 dm cache: fix race condition in the writeback mode overwrite_bio optimisation
Hi Joe,
I was looking through the dm-bio-prison-v2 commit for dm-cache (b29d4986d)
and it is huge, ~5k lines. Do you still have a git branch with these
commits in smaller pieces (not squashed) so we can find the bits that
might be informative for converting lv-thin to use dm-bio-prison-v2?
For example, I think that, at least, the policy changes and
btracker code is dm-cache specific and just a distraction when trying to
understand the dm-bio-prison-v2 conversion.
--
Eric Wheeler
>
>
>
> --
> Eric Wheeler
>
>
> >
> > Nikos
> >
> > > - Joe
> > >
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> >
> >
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-15 21:44 ` Eric Wheeler
@ 2019-12-18 13:13 ` Joe Thornber
2019-12-18 20:27 ` Eric Wheeler
0 siblings, 1 reply; 13+ messages in thread
From: Joe Thornber @ 2019-12-18 13:13 UTC (permalink / raw)
To: Eric Wheeler; +Cc: JeffleXu, dm-devel, Nikos Tsironis
On Sun, Dec 15, 2019 at 09:44:49PM +0000, Eric Wheeler wrote:
> I was looking through the dm-bio-prison-v2 commit for dm-cache (b29d4986d)
> and it is huge, ~5k lines. Do you still have a git branch with these
> commits in smaller pieces (not squashed) so we can find the bits that
> might be informative for converting lv-thin to use dm-bio-prison-v2?
>
> For example, I think that, at least, the policy changes and
> btracker code is dm-cache specific and just a distraction when trying to
> understand the dm-bio-prison-v2 conversion.
To be honest I would hold off for a couple of months. I've been working
on the design for thinp 2 and have got to the point where I need to write
a userland proof of concept implementation. In particular I've focussed on
packing more into btree nodes, and separating transactions so IO to different
thins has no locking contention. The proof of concept will tell me just how
small I can get the metadata. If the level of metadata compression is ~1/10th
we'll plug the new btrees into the existing design and switch to bio prison v2.
If it's greater, say 1/50th, then I'll rewrite the whole target to
use write-ahead logging for transactionality and ditch all metadata sharing altogether.
When the metadata is that small we can copy entire btrees to implement snapshots.
- Joe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: dm-thin: Several Questions on dm-thin performance.
2019-12-18 13:13 ` Joe Thornber
@ 2019-12-18 20:27 ` Eric Wheeler
0 siblings, 0 replies; 13+ messages in thread
From: Eric Wheeler @ 2019-12-18 20:27 UTC (permalink / raw)
To: Joe Thornber; +Cc: JeffleXu, dm-devel, Nikos Tsironis
On Wed, 18 Dec 2019, Joe Thornber wrote:
> On Sun, Dec 15, 2019 at 09:44:49PM +0000, Eric Wheeler wrote:
> > I was looking through the dm-bio-prison-v2 commit for dm-cache (b29d4986d)
> > and it is huge, ~5k lines. Do you still have a git branch with these
> > commits in smaller pieces (not squashed) so we can find the bits that
> > might be informative for converting lv-thin to use dm-bio-prison-v2?
> >
> > For example, I think that, at least, the policy changes and
> > btracker code is dm-cache specific and just a distraction when trying to
> > understand the dm-bio-prison-v2 conversion.
>
> To be honest I would hold off for a couple of months. I've been working
> on the design for thinp 2 and have got to the point where I need to write
> a userland proof of concept implementation. In particular I've focussed on
> packing more into btree nodes, and separating transactions so IO to different
> thins has no locking contention. The proof of concept will tell me just how
> small I can get the metadata. If the level of metadata compression is ~1/10th
> we'll plug the new btrees into the existing design and switch to bio prison v2.
> If it's greater, say 1/50th, then I'll rewrite the whole target to
> use write-ahead logging for transactionality and ditch all metadata sharing altogether.
> When the metadata is that small we can copy entire btrees to implement snapshots.
Sounds great, looking forward to it. The thinp target has worked great
for us over the years. Packing metadata and reducing lock contention will
make it even better.
--
Eric Wheeler
>
> - Joe
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* dm-thin: Several Questions on dm-thin performance.
@ 2019-11-22 3:43 JeffleXu
0 siblings, 0 replies; 13+ messages in thread
From: JeffleXu @ 2019-11-22 3:43 UTC (permalink / raw)
To: agk, Mike Snitzer; +Cc: dm-devel
Hi guys,
I have several questions on dm-thin when I'm testing and evaluating IO
performance of dm-thin. I would be grateful if someone could spend a
little time on it.
The first question is what's the purpose of data cell? In
thin_bio_map(), normal bio will be packed as a virtual cell and data
cell. I can understand that virtual cell is used to prevent discard bio
and non-discard bio targeting the same block from being processed at the
same time. I find it was added in commit
e8088073c9610af017fd47fddd104a2c3afb32e8 (dm thin: fix race between
simultaneous io and discards to same block), but I'm still confused
about the use of data cell.
The second question is the impact of virtual cell and data cell on IO
performance. If $data_block_size is large for example 1G, in multithread
fio test, most bio will be buffered in cell->bios list and then be
processed by worker thread asynchronously, even when there's no discard
bio. Thus the original parallel IO is processed by worker thread
serially now. As the number of fio test threads increase, the single
worker thread can easily get CPU 100%, and thus become the bottleneck of
the performance since dm-thin workqueue is ordered unbound.
Using an nvme SSD and fio (direct=1, ioengine=libaio, iodepth=128,
numjobs=4, rw=read, bs=4k), the bandwidth on bare nvme is 1589MiB/s. The
bandwidth on thin device is only 1274MiB/s, while the four fio threads
run at 200% CPU and the single worker thread is always runing at 100%
CPU. perf of worker thread showes that process_bio() consumes 86% of the
time.
Besides it seems that I can't send email to dm-devel@redhat.com mailing
list.
Regards
Jeffle Xu
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-12-18 20:27 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 3:14 dm-thin: Several Questions on dm-thin performance JeffleXu
2019-11-22 18:55 ` Joe Thornber
2019-12-02 7:50 ` JeffleXu
2019-12-02 22:26 ` Eric Wheeler
2019-12-03 12:51 ` Joe Thornber
2019-12-04 12:11 ` Joe Thornber
2019-12-05 23:14 ` Eric Wheeler
2019-12-06 14:15 ` Nikos Tsironis
2019-12-13 1:32 ` Eric Wheeler
2019-12-15 21:44 ` Eric Wheeler
2019-12-18 13:13 ` Joe Thornber
2019-12-18 20:27 ` Eric Wheeler
2019-11-22 3:43 JeffleXu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.