From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_HIGH autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1EB4C4321D for ; Mon, 20 Aug 2018 23:14:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5E20E20779 for ; Mon, 20 Aug 2018 23:14:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=arista.com header.i=@arista.com header.b="nL9TX9es" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E20E20779 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=arista.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726734AbeHUCcP (ORCPT ); Mon, 20 Aug 2018 22:32:15 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:37835 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726639AbeHUCcO (ORCPT ); Mon, 20 Aug 2018 22:32:14 -0400 Received: by mail-ed1-f65.google.com with SMTP id b10-v6so9506568eds.4 for ; Mon, 20 Aug 2018 16:14:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arista.com; s=googlenew; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=TrtzVsVvH+VhN7407IKdVVxWTgT3dBE3qmij5YXhR1M=; b=nL9TX9esDsproKgB7JQwJ5g37dqByoGniexRzoVcmGSFqfKMAO9Fzq9c5N1LVqvm1y bTPif8GdcYAQQEZK6HJbYdR2solDkxXI5F8AwccgnJqd1BVpwfp0qiulEOifaDggXhu/ vIuL4AUQQ5SjPqUmVqzgreuFfAWxATtOisarN4H/ixt43p97WgtqrKB4psgA7vwJQqzG kusO77CAuUHEVYKXmqAX7zKOpzXlx9FHszE4LaPN0xI3te64a/oB2CLVJQ45JlHT0Bza cFF6qT5p6RLyieDJwD0PJW0MBAGQwd03hkFvin05z7VVX6bc8kaFksrCVoItY9dNd5Uq JVfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=TrtzVsVvH+VhN7407IKdVVxWTgT3dBE3qmij5YXhR1M=; b=MrdpD+rIWGB0+5VcpuHp13KCFkcmZ+JqNgS3RwMzfhPqebAlV641d+dbehQCUsIn4q elp4e3iZCeUmuuFkPhd8Tp497qqVkb8vHjGmfjGa+EcB0SevP07P5lN8StJId2yT0A+5 HboihbDxN5iNHVvOPiTJb8wD5V9SAsQ+dfQjwBIF7Dpxk2gFSzHkpKh1hiVAOYaIXyBq 7R+VjvGComIfk4oTIINEOT/jb/BPBl0kLL47nr++sychXFkRmx2kpkqgf5SuYHL81PbE dUDQ/xH6fbp9rgvpxCNzMyPQZ/dxB9aBXHxB6QzfkEi+2KMdR3jX8W0aW9yvvnewNKgG N6Yw== X-Gm-Message-State: AOUpUlHChkWz1a303XYyaV2lsouRAethtsu+Hc8d4FdGdKX7OtgZRBDQ qbY6rFvFkRjrUADwgsbD3I/Nhw== X-Google-Smtp-Source: AA+uWPySiO/ajvY0R5VC0XKGKIb+Xw4nCRI1WfWv+n2Kl3uaDXWPzcm25oKPEsqkBntxNAkD9O4HQQ== X-Received: by 2002:a50:a392:: with SMTP id s18-v6mr59001928edb.271.1534806877157; Mon, 20 Aug 2018 16:14:37 -0700 (PDT) Received: from dhcp.ire.aristanetworks.com ([217.173.96.166]) by smtp.gmail.com with ESMTPSA id h8-v6sm6632738edi.68.2018.08.20.16.14.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 20 Aug 2018 16:14:36 -0700 (PDT) Message-ID: <1534806875.30457.28.camel@arista.com> Subject: Re: [PATCHv3] lib/ratelimit: Lockless ratelimiting From: Dmitry Safonov To: Petr Mladek Cc: linux-kernel@vger.kernel.org, Steven Rostedt , Andy Shevchenko , Arnd Bergmann , David Airlie , Greg Kroah-Hartman , Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Theodore Ts'o , Thomas Gleixner , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Date: Tue, 21 Aug 2018 00:14:35 +0100 In-Reply-To: <20180815151047.qgjam3t3ujyacmaf@pathway.suse.cz> References: <20180703225628.25684-1-dima@arista.com> <20180815151047.qgjam3t3ujyacmaf@pathway.suse.cz> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.24.6 (3.24.6-1.fc26) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Petr, thanks for review, On Wed, 2018-08-15 at 17:10 +0200, Petr Mladek wrote: > On Tue 2018-07-03 23:56:28, Dmitry Safonov wrote: > > Currently ratelimit_state is protected with spin_lock. If the .lock > > is > > taken at the moment of ___ratelimit() call, the message is > > suppressed to > > make ratelimiting robust. > > > > That results in the following issue issue: > > CPU0 CPU1 > > ------------------ ------------------ > > printk_ratelimit() printk_ratelimit() > > | | > > try_spin_lock() try_spin_lock() > > | | > > time_is_before_jiffies() return 0; // suppress > > > > So, concurrent call of ___ratelimit() results in silently > > suppressing > > one of the messages, regardless if the limit is reached or not. > > And rc->missed is not increased in such case so the issue is > > covered > > from user. > > > > Convert ratelimiting to use atomic counters and drop spin_lock. > > > > Note: That might be unexpected, but with the first interval of > > messages > > storm one can print up to burst*2 messages. So, it doesn't > > guarantee > > that in *any* interval amount of messages is lesser than burst. > > But, that differs lightly from previous behavior where one can > > start > > burst=5 interval and print 4 messages on the last milliseconds of > > interval + new 5 messages from new interval (totally 9 messages in > > lesser than interval value): > > I am still confused by this paragraph. Does this patch change the > behavior? What is the original and what is the new one, please? Yeah, I could be a bit cleaner about the change. Originally more than `burst` messages could be printed if the previous period hasn't ended: > > > msg0 msg1-msg4 msg0-msg4 > > | | | > > | | | > > |--o---------------------o-|-----o--------------------|--> (t) > > <-------> > > Lesser than burst But now, because I check: + if (atomic_add_unless(&rs->printed, 1, rs->burst)) + return 1; *before* checking the end of interval, the maximum number of messages in the peak will be the same, burst*2 (burst*2 - 1 in original). Why do I check it before the end of interval? I thought it would be a nice optimization, making atomic_add_unless() the only called function on the fast-path (when we haven't hit the limit yet). If you want, I can move it into a separate patch.. > > > > Dropped dev/random patch since v1 version: > > lkml.kernel.org/r/<20180510125211.12583-1-dima@arista.com> > > > > Dropped `name' in as it's unused in RATELIMIT_STATE_INIT() > > > > diff --git a/lib/ratelimit.c b/lib/ratelimit.c > > index d01f47135239..d9b749d40108 100644 > > --- a/lib/ratelimit.c > > +++ b/lib/ratelimit.c > > @@ -13,6 +13,18 @@ > > #include > > #include > > > > +static void ratelimit_end_interval(struct ratelimit_state *rs, > > const char *func) > > +{ > > + rs->begin = jiffies; > > + > > + if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) { > > + unsigned int missed = atomic_xchg(&rs->missed, 0); > > + > > + if (missed) > > + pr_warn("%s: %u callbacks suppressed\n", > > func, missed); > > + } > > +} > > + > > /* > > * __ratelimit - rate limiting > > * @rs: ratelimit_state data > > @@ -27,45 +39,30 @@ > > */ > > int ___ratelimit(struct ratelimit_state *rs, const char *func) > > { > > - unsigned long flags; > > - int ret; > > - > > if (!rs->interval) > > return 1; > > > > - /* > > - * If we contend on this state's lock then almost > > - * by definition we are too busy to print a message, > > - * in addition to the one that will be printed by > > - * the entity that is holding the lock already: > > - */ > > - if (!raw_spin_trylock_irqsave(&rs->lock, flags)) > > + if (unlikely(!rs->burst)) { > > + atomic_add_unless(&rs->missed, 1, -1); > > + if (time_is_before_jiffies(rs->begin + rs- > > >interval)) > > + ratelimit_end_interval(rs, func); > > This is racy. time_is_before_jiffies() might be valid on two > CPUs in parallel. They would both call ratelimit_end_interval(). > This is not longer atomic context. Therefore one might get scheduled > and set rs->begin = jiffies seconds later. I am sure that there might > be more crazy scenarios. That's the case with rs->burst == 0. So, in this situation all the messages will be suppressed. And the only reason to call ratelimit_end_interval() is to update the start time and number of messages not printed. I didn't want to add any complexion for this case - the worst will be the count of messages suppressed will be imprecise for rs->burst == 0 case. Not a big deal, huh? > > > + > > return 0; > > + } > > > > - if (!rs->begin) > > - rs->begin = jiffies; > > + if (atomic_add_unless(&rs->printed, 1, rs->burst)) > > + return 1; > > > > if (time_is_before_jiffies(rs->begin + rs->interval)) { > > - if (rs->missed) { > > - if (!(rs->flags & > > RATELIMIT_MSG_ON_RELEASE)) { > > - printk_deferred(KERN_WARNING > > - "%s: %d callbacks > > suppressed\n", > > - func, rs->missed); > > - rs->missed = 0; > > - } > > - } > > - rs->begin = jiffies; > > - rs->printed = 0; > > - } > > - if (rs->burst && rs->burst > rs->printed) { > > - rs->printed++; > > - ret = 1; > > - } else { > > - rs->missed++; > > - ret = 0; > > + if (atomic_cmpxchg(&rs->printed, rs->burst, 0)) > > + ratelimit_end_interval(rs, func); > > } > > - raw_spin_unlock_irqrestore(&rs->lock, flags); > > > > - return ret; > > + if (atomic_add_unless(&rs->printed, 1, rs->burst)) > > + return 1; > > The entire logic is complicated and hard to understand. Especially > calling > ratelimit_end_interval() and atomic_add_unless(&rs->printed) twice. Probably, I've described the reasons above.. I may add a comment or drop the first atomic_add_unless(), but I would prefer still keep it there, having only one check for fast-path. > > > + atomic_add_unless(&rs->missed, 1, -1); > > + > > + return 0; > > } > > > I wonder if the following code would do the job (not even compile > tested!): > > static void ratelimit_end_interval(struct ratelimit_state *rs, const > char *func) > { > rs->begin = jiffies; > > if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) { > unsigned int missed = atomic_xchg(&rs->missed, 0); > > if (missed) > pr_warn("%s: %u callbacks suppressed\n", func, > missed); > } > > atomic_xchg(&rs->printed, 0); > } > > /* > * __ratelimit - rate limiting > * @rs: ratelimit_state data > * @func: name of calling function > * > * This enforces a rate limit: not more than @rs->burst callbacks > * in every @rs->interval > * > * RETURNS: > * 0 means callbacks will be suppressed. > * 1 means go ahead and do it. > */ > int ___ratelimit(struct ratelimit_state *rs, const char *func) > { > unsigned long begin, old_begin = rs->begin; > > if (!rs->interval) > return 1; > > if (time_is_before_jiffies(rs->begin + rs->interval) && > cmpxchg(&rs->begin, begin, begin + rs->interval) == begin) Probably, cmpxchg(&rs->begin, begin, jiffies)? Otherwise hitting it later will be buggy. Also `begin` might be uninitialized? (if cmpxchg() fails) > { > ratelimit_end_interval(rs, func); > } > > if (atomic_add_unless(&rs->printed, 1, rs->burst)) > return 1; > > atomic_add_unless(&rs->missed, 1, -1); > return 0; > } > EXPORT_SYMBOL(___ratelimit); > > > The main logic is the same as in the original code. Only one CPU is > able to reset the interval and counters (thanks to cmpxchg). > Every caller increases either "printed" or "missed" counter. Do we care about fast-path argument I did above? -- Thanks, Dmitry