From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF81C77B7D for ; Wed, 10 May 2023 08:31:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236502AbjEJIbP (ORCPT ); Wed, 10 May 2023 04:31:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236532AbjEJIbB (ORCPT ); Wed, 10 May 2023 04:31:01 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96CAA449E for ; Wed, 10 May 2023 01:30:59 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1683707457; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=usGwrNqwocT5ZlgBtBjsBCzaf/mufKvIc8NIljU8lj4=; b=MH2VtSK+aP3fFv+wfunCTyL92HVxXBkunD0KDuuyeEn024DOAWjklVSbrsLIx+vqOJAS5e Q6SU+Bhik8RujQK45hX1bREqQ+lr5KjUTERaRtV93MYT6JQvIhgjsRHlOZ8ecg49mwR1g9 rrxfZnMWoA+Pj8allejcsgJMsoAy4iefDp5UOz533l/vPv53b5tfdpAs4VQn1eBHXEVUFe SLauYoOoRV1ut3OKY2Es29k4H8fmD6XPwkoZR5vyV0+pjZ9WTWFwcpsiAX/ePtv8ei023E zKF7/yBNLALoZeCKvRtn3l0xd60lJY3QeiijA9qOmkPqRLHtaSXDGxGpHEC6TA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1683707457; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=usGwrNqwocT5ZlgBtBjsBCzaf/mufKvIc8NIljU8lj4=; b=7Iq46h07djEVdcpH62BXAS7r7IyJNXLH8nWZCXmnlr9SidmOl1ciVS2EZ4ofGDD6xv6FvM VGENJv79VSevXPCg== To: Pavel Tikhomirov , Frederic Weisbecker Cc: LKML , Anna-Maria Behnsen , Peter Zijlstra , syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com, Dmitry Vyukov , Sebastian Siewior , Michael Kerrisk , Andrei Vagin , Christian Brauner , Alexander Mikhalitsyn , Pavel Emelyanov Subject: Re: [RFD] posix-timers: CRIU woes In-Reply-To: <009e7658-1377-cc79-7a42-4dda8fec5af0@virtuozzo.com> References: <20230425181827.219128101@linutronix.de> <20230425183312.932345089@linutronix.de> <87zg6i2xn3.ffs@tglx> <87v8h62vwp.ffs@tglx> <878rdy32ri.ffs@tglx> <87v8h126p2.ffs@tglx> <875y911xeg.ffs@tglx> <87ednpyyeo.ffs@tglx> <009e7658-1377-cc79-7a42-4dda8fec5af0@virtuozzo.com> Date: Wed, 10 May 2023 10:30:57 +0200 Message-ID: <87wn1gy4e6.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Pavel! On Wed, May 10 2023 at 12:36, Pavel Tikhomirov wrote: > On 10.05.2023 05:42, Thomas Gleixner wrote: >> So because of that half thought out user space ABI we are now up the >> regression creek without a paddle, unless CRIU can accomodate to a >> different restore mechanism to lift this restriction from the kernel. >> >> Thoughts? > > Maybe we can do something similar to /proc/sys/kernel/ns_last_pid? > Switch to per-(process->signal) idr based approach with idr_set_cursor > to set next id for next posix timer from new sysctl? I'm not a fan of such sysctls. We have already too many of them and that particular one does not buy much. We can simply let timer_create() or a new syscall create a timer at a given ID. That allows CRIU to restore any checkpointed process no matter which kernel version it came from without doing this insane create/delete dance. The downside is that this allows to create stupidly sparse timer IDs even for the non CRIU case, which increases per process kernel memory consumption and creates slightly more overhead in the signal delivery path. The latter is a burden on the process owning the timer and not affecting expiry, which is a context stealing operation. The memory part needs eventually some thoughts vs. accounting. If the 'explicit at ID' option is not used then the ID mechanism is optimzied for dense IDs by using the first available ID in a bottom up search, which recovers holes created by a timer_delete() operation. Thanks, tglx