From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49781C433FE for ; Thu, 12 May 2022 13:42:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354523AbiELNmA (ORCPT ); Thu, 12 May 2022 09:42:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354525AbiELNl7 (ORCPT ); Thu, 12 May 2022 09:41:59 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2476C5D656 for ; Thu, 12 May 2022 06:41:58 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C70B421C75; Thu, 12 May 2022 13:41:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1652362916; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CbuER8xzdLnfoXwiSAlwGsCtLIs7A4ho5lInbfwJPd0=; b=rYzyF6/BEHK+DWbYpMyUeSSV8Vb24neFKh+pWwoGbTodcZgA65OBnzEsfIbB9VZwtTThH0 2Yq5Zzlx2GQddAQZo1BImURlV3Jvz0jJyRfyDFe2im+UdYll0FxxwIp+oz7U3+B4PwKvka DXWKCw9ieVCnalQJQbExd7IsmgLqUN0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1652362916; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CbuER8xzdLnfoXwiSAlwGsCtLIs7A4ho5lInbfwJPd0=; b=NOMMxOeUDZ7opUm5a2+jcGfY0j1KxcmfqaS7wwA+u23DL5eH9cWSY4JOK6vxptEBdLWtqo ue06hbofAosBsTBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3944313A84; Thu, 12 May 2022 13:41:55 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6yYhAKMOfWLKPAAAMHmgww (envelope-from ); Thu, 12 May 2022 13:41:54 +0000 Message-ID: Date: Thu, 12 May 2022 21:41:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH v3] bcache: dynamic incremental gc Content-Language: en-US To: mingzhe.zou@easystack.cn Cc: dongsheng.yang@easystack.cn, linux-bcache@vger.kernel.org, zoumingzhe@qq.com References: <20220511073903.13568-1-mingzhe.zou@easystack.cn> From: Coly Li In-Reply-To: <20220511073903.13568-1-mingzhe.zou@easystack.cn> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-bcache@vger.kernel.org On 5/11/22 3:39 PM, mingzhe.zou@easystack.cn wrote: > From: ZouMingzhe > > Currently, GC wants no more than 100 times, with at least > 100 nodes each time, the maximum number of nodes each time > is not limited. > > ``` > static size_t btree_gc_min_nodes(struct cache_set *c) > { > ...... > min_nodes = c->gc_stats.nodes / MAX_GC_TIMES; > if (min_nodes < MIN_GC_NODES) > min_nodes = MIN_GC_NODES; > > return min_nodes; > } > ``` > > According to our test data, when nvme is used as the cache, > it takes about 1ms for GC to handle each node (block 4k and > bucket 512k). This means that the latency during GC is at > least 100ms. During GC, IO performance would be reduced by > half or more. > > I want to optimize the IOPS and latency under high pressure. > This patch hold the inflight peak. When IO depth up to maximum, > GC only process very few(10) nodes, then sleep immediately and > handle these requests. > > bch_bucket_alloc() maybe wait for bch_allocator_thread() to > wake up, and and bch_allocator_thread() needs to wait for gc > to complete, in which case gc needs to end quickly. So, add > bucket_alloc_inflight to cache_set in v3. > > ``` > long bch_bucket_alloc(struct cache *ca, unsigned int reserve, bool wait) > { > ...... > do { > prepare_to_wait(&ca->set->bucket_wait, &w, > TASK_UNINTERRUPTIBLE); > > mutex_unlock(&ca->set->bucket_lock); > schedule(); > mutex_lock(&ca->set->bucket_lock); > } while (!fifo_pop(&ca->free[RESERVE_NONE], r) && > !fifo_pop(&ca->free[reserve], r)); > ...... > } > > static int bch_allocator_thread(void *arg) > { > ...... > allocator_wait(ca, bch_allocator_push(ca, bucket)); > wake_up(&ca->set->btree_cache_wait); > wake_up(&ca->set->bucket_wait); > ...... > } > > static void bch_btree_gc(struct cache_set *c) > { > ...... > bch_btree_gc_finish(c); > wake_up_allocators(c); > ...... > } > ``` > > Apply this patch, each GC maybe only process very few nodes, > GC would last a long time if sleep 100ms each time. So, the > sleep time should be calculated dynamically based on gc_cost. > > At the same time, I added some cost statistics in gc_stat, > hoping to provide useful information for future work. Hi Mingzhe, From the first glance, I feel this change may delay the small GC period, and finally result a large GC period, which is not expected. But it is possible that my feeling is incorrect. Do you have detailed performance number about both I/O latency  and GC period, then I can have more understanding for this effort. BTW, I will add this patch to my testing set and experience myself. Thanks. Coly Li [snipped]