* [RFC] dm-thin: Heuristic early chunk copy before COW @ 2017-03-08 18:17 Eric Wheeler 2017-03-09 11:51 ` Joe Thornber 0 siblings, 1 reply; 3+ messages in thread From: Eric Wheeler @ 2017-03-08 18:17 UTC (permalink / raw) To: dm-devel Hello all, For dm-thin volumes that are snapshotted often, there is a performance penalty for writes because of COW overhead since the modified chunk needs to be copied into a freshly allocated chunk. What if we were to implement some sort of LRU for COW operations on chunks? We could then queue chunks that are commonly COWed within the inter-snapshot interval to be background copied immediately after the next snapshot. This would hide the latency and increase effective throughput when the thin device is written by its user since only the meta data would need an update because the chunk has already been copied. I can imagine a simple algorithm where the COW increments the chunk LRU by 2, and decrements the LRU by 1 for all stored LRUs when the volume is snapshotted. After the snapshot, any LRU>0 would be queued for early copy. The LRU would be in memory only, probably stored in a red/black tree. Pre-copied chunks would not update on-disk meta data unless a write occurs to that chunk. The allocator would need to be updated to ignore chunks that are in the LRU list which have been pre-copied (perhaps except in the case of pool free space exhaustion). Does this sound viable? -- Eric Wheeler ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] dm-thin: Heuristic early chunk copy before COW 2017-03-08 18:17 [RFC] dm-thin: Heuristic early chunk copy before COW Eric Wheeler @ 2017-03-09 11:51 ` Joe Thornber 2017-03-11 0:43 ` Eric Wheeler 0 siblings, 1 reply; 3+ messages in thread From: Joe Thornber @ 2017-03-09 11:51 UTC (permalink / raw) To: Eric Wheeler; +Cc: dm-devel Hi Eric, On Wed, Mar 08, 2017 at 10:17:51AM -0800, Eric Wheeler wrote: > Hello all, > > For dm-thin volumes that are snapshotted often, there is a performance > penalty for writes because of COW overhead since the modified chunk needs > to be copied into a freshly allocated chunk. > > What if we were to implement some sort of LRU for COW operations on > chunks? We could then queue chunks that are commonly COWed within the > inter-snapshot interval to be background copied immediately after the next > snapshot. This would hide the latency and increase effective throughput > when the thin device is written by its user since only the meta data would > need an update because the chunk has already been copied. > > I can imagine a simple algorithm where the COW increments the chunk LRU by > 2, and decrements the LRU by 1 for all stored LRUs when the volume is > snapshotted. After the snapshot, any LRU>0 would be queued for early copy. > > The LRU would be in memory only, probably stored in a red/black tree. > Pre-copied chunks would not update on-disk meta data unless a write occurs > to that chunk. The allocator would need to be updated to ignore chunks > that are in the LRU list which have been pre-copied (perhaps except in the > case of pool free space exhaustion). > > Does this sound viable? Yes, I can see that it would benefit some people, and presumably we'd only turn it on for those people. Random thoughts: - I'm doing a lot of background work in the latest version of dm-cache in idle periods and it certainly pays off. - There can be a *lot* of chunks, so holding a counter for all chunks in memory is not on. (See the hassle I had squeezing stuff into memory of dm-cache). - Commonly cloned blocks can be gleaned from the metadata. eg, by walking the metadata for two snapshots and taking the common ones. It might be possible to come up with a 'commonly used set' once, and then keep using it for all future snaps. - Doing speculative work like this makes it harder to predict performance. At the moment any expense (ie. copy) is incurred immediately as the triggering write comes in. - Could this be done from userland? Metadata snapshots let userland see the mappings, alternatively dm-era let's userland track where io has gone. A simple read then write of a block would trigger the sharing to be broken. - Joe ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] dm-thin: Heuristic early chunk copy before COW 2017-03-09 11:51 ` Joe Thornber @ 2017-03-11 0:43 ` Eric Wheeler 0 siblings, 0 replies; 3+ messages in thread From: Eric Wheeler @ 2017-03-11 0:43 UTC (permalink / raw) To: Joe Thornber; +Cc: dm-devel On Thu, 9 Mar 2017, Joe Thornber wrote: > Hi Eric, > > On Wed, Mar 08, 2017 at 10:17:51AM -0800, Eric Wheeler wrote: > > Hello all, > > > > For dm-thin volumes that are snapshotted often, there is a performance > > penalty for writes because of COW overhead since the modified chunk needs > > to be copied into a freshly allocated chunk. > > > > What if we were to implement some sort of LRU for COW operations on > > chunks? We could then queue chunks that are commonly COWed within the > > inter-snapshot interval to be background copied immediately after the next > > snapshot. This would hide the latency and increase effective throughput > > when the thin device is written by its user since only the meta data would > > need an update because the chunk has already been copied. > > > > I can imagine a simple algorithm where the COW increments the chunk LRU by > > 2, and decrements the LRU by 1 for all stored LRUs when the volume is > > snapshotted. After the snapshot, any LRU>0 would be queued for early copy. > > > > The LRU would be in memory only, probably stored in a red/black tree. > > Pre-copied chunks would not update on-disk meta data unless a write occurs > > to that chunk. The allocator would need to be updated to ignore chunks > > that are in the LRU list which have been pre-copied (perhaps except in the > > case of pool free space exhaustion). > > > > Does this sound viable? > > Yes, I can see that it would benefit some people, and presumably we'd > only turn it on for those people. Random thoughts: > > - I'm doing a lot of background work in the latest version of dm-cache > in idle periods and it certainly pays off. > > - There can be a *lot* of chunks, so holding a counter for all chunks in > memory is not on. (See the hassle I had squeezing stuff into memory > of dm-cache). > > - Commonly cloned blocks can be gleaned from the metadata. eg, by > walking the metadata for two snapshots and taking the common ones. > It might be possible to come up with a 'commonly used set' once, and > then keep using it for all future snaps. That's a good idea. I have quite a few snapshot dump records, I'll run through them and see how common the COW blocks are between hourly snapshots. > - Doing speculative work like this makes it harder to predict > performance. At the moment any expense (ie. copy) is incurred > immediately as the triggering write comes in. True. We would definitely want the early COW copies to run as idle IO. > - Could this be done from userland? Metadata snapshots let userland see > the mappings, alternatively dm-era let's userland track where io has > gone. A simple read then write of a block would trigger the sharing > to be broken. Userland could definitely break mappings in userspace with a pre-COWing process, however, you would want to somehow lock the block so as not to race the thin device user with the pre-COWing process. Is there already a mechamism to lock blocks from userspace and release them after the copy? While this would work, it locks the block and prevents the opportunity from the thin device user from making its write to the COW if there was such a race; the optimal case when the pre-COWing process races the thin device user is to let the thin device user to win. I acknowledge that the memory footprint and issues related to storing the pre-COW LRU in kernel memory might be notable. However, if there is a way to let the kernel do the work somehow, then we can pre-copy without breaking the COW in the meta data to preserve pool space; mappings would only be broken in the thin meta data if the thin device user actually writes to the specutively pre-COWed chunk. I suppose the meta data does not need to be in RAM; it could be an ephemeral on-disk b-tree under dm-bufio. For example, the thin pool could be passed an optional meta data volume for LRU purposes. -Eric > > - Joe > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-03-11 0:43 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-08 18:17 [RFC] dm-thin: Heuristic early chunk copy before COW Eric Wheeler 2017-03-09 11:51 ` Joe Thornber 2017-03-11 0:43 ` Eric Wheeler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.