From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94489C3F2D2 for ; Mon, 2 Mar 2020 13:40:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 671AA2187F for ; Mon, 2 Mar 2020 13:40:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1583156453; bh=MwVyYgprJVND9MPwAK+UKnxgsGaUu6jZfSxRG7Kj05Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=0Nwd4tUphiAFGSe3rnMRsg0TNxEnRSpy2ZWKmYwk+fyGBO8gQHjy70G7FsMF6owJk 1ah+2E8/UaETJ3ze0Dgma/nV1rdYbp0k1Asf+mVSlDfxKYGzqiLQf9qa/N4C2othcD 8BKY1el+DRWmqls1FV70PZAJYeaPdCcsjP+bDBO0= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727734AbgCBNkx (ORCPT ); Mon, 2 Mar 2020 08:40:53 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:39920 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727627AbgCBNkw (ORCPT ); Mon, 2 Mar 2020 08:40:52 -0500 Received: by mail-wr1-f65.google.com with SMTP id y17so12653919wrn.6; Mon, 02 Mar 2020 05:40:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Z3+rVWv/UryQhiWJcT9oQnLzIE2lJBETYP6wP31uofI=; b=ucgY1AGAzU7/AnyFFCAFPrcRQHZxdl4xEUg3+TYdaYIwTCGDWoFeaBnYjoUbJ5txpj HytS5o8R0SD0SCwJbWZufQ0q0isrBwZ4uk3Tr+FWhUTJHZJ2DG6pGRVB2q77b2kG5LbQ 4Ee2buwabTvlLbeZR4D50BUrYPv5B/Iih7Tn35wIxG+QqlMdIp8OEn7rDqxeSMmL3oGP fk1CeDHtiCjZWMiLVsg0/8CyNn3fOv6mA6sBKRyaBqunXtXzm4Tf4MRARluRfN6FX6qH ccU6P8t4gaz9m0vNHTIfFtPPx1gtdtWlILTGC1kc31diKm84g7UPWf9pvAXz8bEleHYq 9LAA== X-Gm-Message-State: APjAAAWrcz8izxWc314lf6II595PJPEDYkC1MeSSp5UUulNXlKPnfmAs kPvCY6mBPi8NDPehX/GdEOc= X-Google-Smtp-Source: APXvYqwEB11NRYT0lgBjiExd91VaQCJL3aE7bngI05EyrcY4FHLy7u2Lvf8sG8l52x/NxVV8UMcFzg== X-Received: by 2002:adf:f7c1:: with SMTP id a1mr15169886wrq.299.1583156450508; Mon, 02 Mar 2020 05:40:50 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id u25sm7776775wml.17.2020.03.02.05.40.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Mar 2020 05:40:49 -0800 (PST) Date: Mon, 2 Mar 2020 14:40:48 +0100 From: Michal Hocko To: Coly Li Cc: axboe@kernel.dk, linux-bcache@vger.kernel.org, linux-block@vger.kernel.org, hare@suse.de, mkoutny@suse.com, Oleg Nesterov , Christoph Hellwig Subject: Re: [PATCH 1/2] bcache: ignore pending signals in bcache_device_init() Message-ID: <20200302134048.GK4380@dhcp22.suse.cz> References: <20200302093450.48016-1-colyli@suse.de> <20200302093450.48016-2-colyli@suse.de> <20200302122748.GH4380@dhcp22.suse.cz> <29a1c9fa-46e2-af5f-9531-c25dbb0a3dca@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29a1c9fa-46e2-af5f-9531-c25dbb0a3dca@suse.de> Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Mon 02-03-20 21:29:09, Coly Li wrote: [...] > > I cannot really comment on the bcache part because I am not familiar > > with the code. It is quite surprising to see an initialization taking > > that long though. > > > > Back to the time 10 years ago when bcache merged into Linux mainline, > checking meta data for a 120G SSD was fast. But now an 8TB SSD is quite > common on server... So the problem appears. Does all that work has to happen synchronously from the kworker context? Is it possible some of the initialization to be done more lazily or in the background? > > Anyway > > > >> This patch calls flush_signals() in bcache_device_init() if there is > >> pending signal for current process. It avoids bcache registration > >> failure in system boot up time due to bcache udev rule timeout. > > > > this sounds like a wrong way to address the issue. Killing the udev > > worker is a userspace policy and the kernel shouldn't simply ignore it. > > Indeed the bcache registering process cannot be killed, because a mutex > lock (bch_register_lock) is held during all the registration operation. > > In my testing, kthread_run()/kthread_create() failure by pending signal > happens after all metadata checking finished, that's 55 minutes later. > No mater the registration successes or fails, the time length is same. > > Once the udev timeout killing is useless, why not make the registration > to success ? This is what the patch does. I cannot really comment for the systemd part but it is quite unexpected for it to have signals ignored completely. > > Is there any problem to simply increase the timeout on the system which > > uses a large bcache? > > > > At this moment, this is a workaround. Christoph Hellwig also suggests to > fix kthread_run()/kthread_create(). Now I am looking for method to > distinct that the parent process is killed by OOM killer and not by > other processes in kthread_run()/kthread_create(), but the solution is > not clear to me yet. It is really hard to comment on this because I do not have a sufficient insight but in genereal. The oom victim context can be checked by tsk_is_oom_victim but kernel threads are subject of the oom killer because they do not own any address space. I also suspect that none of the data you allocate for the cache is accounted per any specific process. > When meta-data size is around 40GB, registering cache device will take > around 55 minutes on my machine for current Linux kernel. I have patch > to reduce the time to around 7~8 minutes but still too long. I may add a > timeout in bcache udev rule for example 10 munites, but when the cache > device get large and large, the timeout will be not enough eventually. > > As I mentioned, this is a workaround to fix the problem now. Fixing > kthread_run()/kthread_create() may take longer time for me. If there is > hint to make it, please offer me. My main question is why there is any need to touch the kernel code. You can still update the systemd/udev timeout AFAIK. This would be the proper workaround from my (admittedly limited) POV. -- Michal Hocko SUSE Labs