From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Thornber Subject: Re: [RFC] dm-thin: Heuristic early chunk copy before COW Date: Thu, 9 Mar 2017 11:51:43 +0000 Message-ID: <20170309115142.GA17308@nim> References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Eric Wheeler Cc: dm-devel@redhat.com List-Id: dm-devel.ids Hi Eric, On Wed, Mar 08, 2017 at 10:17:51AM -0800, Eric Wheeler wrote: > Hello all, > > For dm-thin volumes that are snapshotted often, there is a performance > penalty for writes because of COW overhead since the modified chunk needs > to be copied into a freshly allocated chunk. > > What if we were to implement some sort of LRU for COW operations on > chunks? We could then queue chunks that are commonly COWed within the > inter-snapshot interval to be background copied immediately after the next > snapshot. This would hide the latency and increase effective throughput > when the thin device is written by its user since only the meta data would > need an update because the chunk has already been copied. > > I can imagine a simple algorithm where the COW increments the chunk LRU by > 2, and decrements the LRU by 1 for all stored LRUs when the volume is > snapshotted. After the snapshot, any LRU>0 would be queued for early copy. > > The LRU would be in memory only, probably stored in a red/black tree. > Pre-copied chunks would not update on-disk meta data unless a write occurs > to that chunk. The allocator would need to be updated to ignore chunks > that are in the LRU list which have been pre-copied (perhaps except in the > case of pool free space exhaustion). > > Does this sound viable? Yes, I can see that it would benefit some people, and presumably we'd only turn it on for those people. Random thoughts: - I'm doing a lot of background work in the latest version of dm-cache in idle periods and it certainly pays off. - There can be a *lot* of chunks, so holding a counter for all chunks in memory is not on. (See the hassle I had squeezing stuff into memory of dm-cache). - Commonly cloned blocks can be gleaned from the metadata. eg, by walking the metadata for two snapshots and taking the common ones. It might be possible to come up with a 'commonly used set' once, and then keep using it for all future snaps. - Doing speculative work like this makes it harder to predict performance. At the moment any expense (ie. copy) is incurred immediately as the triggering write comes in. - Could this be done from userland? Metadata snapshots let userland see the mappings, alternatively dm-era let's userland track where io has gone. A simple read then write of a block would trigger the sharing to be broken. - Joe