From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 408F1C5DF62 for ; Wed, 6 Nov 2019 10:03:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EC2D12173E for ; Wed, 6 Nov 2019 10:03:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="e9psByQK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729944AbfKFKD3 (ORCPT ); Wed, 6 Nov 2019 05:03:29 -0500 Received: from mail-oi1-f196.google.com ([209.85.167.196]:35190 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726143AbfKFKD2 (ORCPT ); Wed, 6 Nov 2019 05:03:28 -0500 Received: by mail-oi1-f196.google.com with SMTP id n16so20442177oig.2 for ; Wed, 06 Nov 2019 02:03:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oecBSvIUrZfgGbfDkFaP0N9DR+5D4Lv7CuoNOjwV1n8=; b=e9psByQKWX7p/kTzMon5mlLZAxPsBNa71ePYvzcE0u/L6XvINm5TSja9Pk0tzFDm+1 0I2RfDYHAAgxQMaIRLEPy6osSYJLwlCZ/CWIihf5r0XOBrrmNfU0uQFtXRAngZINIPTG oaXUtuVgdxjm5uSTfAfvOwrqaidY+u2m46DBJLDftvoRMsmKSXjlC1+qVZbQnZ3DKRP3 b320e8p4DyLOPWYYcQnCjGZe8Jsix6KfC6c9R7CP3/rQQyFyYS0sXGCTh5t6dpn+cNT3 4lwRtx+vczrmzNDGTlPKQkspcmNEvXmPigkbjkVPQnQjXbZ+NwJ0cyB0yuIiLmnyOnaW gcSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oecBSvIUrZfgGbfDkFaP0N9DR+5D4Lv7CuoNOjwV1n8=; b=KrCo0K17LpHRuuXGWeuu6I475AyY9BMO/zI+6saQdrvBJeujKhDL3nZTrry26ZI/SA VU/QJRN5pF1aXqv/a3gjOUfmFRiAj8tnTzhHhzjURTYkjoM7caqhDl5a6jb8AmhGc84h OEZQQn2nckTHiMnUcMNwJbAhlNeSma0LB9bBQh+u4MTYAwT1xlEGwy6qbkTQlKT2TeBv I4+XfldeqW2SCr/v1zLlcrVQWsfcivsd4XJ4X2qv5D3Bct7HSML4GbKHXuOnnfXxCONl H3qvp3spUL4Z1pJeRE0XhqDlxd4ZlwrUK4NCWxx2wDJmlmRJFtr1C42pfm1qaCHFhhSG 6uww== X-Gm-Message-State: APjAAAVNqrWn3C6OiX7fwl9QxFky17KVa9kHHtPs5AL5FP6EL4Gx3hWV NOZ90kQTOqUAYrfnJYBLw4bjbuqTsqGtzFCdK1lCRQ== X-Google-Smtp-Source: APXvYqzlo3EP0QaBuW/FJixBYXJCpW9Bg7NtROEx2s/7504Ha4MenBQ0K9KDNNDokx1RK7wTjroNDXe9tGZ+mnBpwc4= X-Received: by 2002:aca:4dcc:: with SMTP id a195mr1496135oib.172.1573034606430; Wed, 06 Nov 2019 02:03:26 -0800 (PST) MIME-Version: 1.0 References: <20191104142745.14722-1-elver@google.com> <20191104142745.14722-2-elver@google.com> In-Reply-To: From: Marco Elver Date: Wed, 6 Nov 2019 11:03:14 +0100 Message-ID: Subject: Re: [PATCH v3 1/9] kcsan: Add Kernel Concurrency Sanitizer infrastructure To: Dmitry Vyukov Cc: LKMM Maintainers -- Akira Yokosawa , Alan Stern , Alexander Potapenko , Andrea Parri , Andrey Konovalov , Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Boqun Feng , Borislav Petkov , Daniel Axtens , Daniel Lustig , Dave Hansen , David Howells , "H. Peter Anvin" , Ingo Molnar , Jade Alglave , Joel Fernandes , Jonathan Corbet , Josh Poimboeuf , Luc Maranget , Mark Rutland , Nicholas Piggin , "Paul E. McKenney" , Peter Zijlstra , Thomas Gleixner , Will Deacon , kasan-dev , linux-arch , "open list:DOCUMENTATION" , linux-efi@vger.kernel.org, "open list:KERNEL BUILD + fi..." , LKML , Linux-MM , "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dmitry, On Wed, 6 Nov 2019 at 10:38, Dmitry Vyukov wrote: > > On Mon, Nov 4, 2019 at 3:28 PM Marco Elver wrote: > > > > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for > > kernel space. KCSAN is a sampling watchpoint-based data-race detector. > > See the included Documentation/dev-tools/kcsan.rst for more details. > ... > > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size, > > + bool expect_write, > > + long *encoded_watchpoint) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK; > > + atomic_long_t *watchpoint; > > + unsigned long wp_addr_masked; > > + size_t wp_size; > > + bool is_write; > > + int i; > > + > > + BUILD_BUG_ON(CONFIG_KCSAN_NUM_WATCHPOINTS < CHECK_NUM_SLOTS); > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > > The fast path code become much nicer! > I did another pass looking at how we can optimize the fast path. > Currently we still have 2 push/pop pairs on the fast path because of > register pressure. The logic in SLOT_IDX seems to be the main culprit. > We discussed several options offline: > 1. Just check 1 slot and ignore all corner cases (we will miss racing > unaligned access to different addresses but overlapping and crossing > pages, which sounds pretty esoteric) > 2. Check 3 slots in order and without wraparound (watchpoints[slot + > i], where i=-1,0,1), this will require adding dummy slots around the > array > 3. An interesting option is to check just 2 slots (that's enough!), to > make this work we will need to slightly offset bucket number when > setting a watch point (namely, if an access goes to the very end of a > page, we set the watchpoint into the bucket corresponding to the > _next_ page) > All of these options remove push/pop in my experiments. Obviously > checking fewer slots will reduce dynamic overhead even more. > > > > + *encoded_watchpoint = atomic_long_read(watchpoint); > > + if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked, > > + &wp_size, &is_write)) > > + continue; > > + > > + if (expect_write && !is_write) > > + continue; > > + > > + /* Check if the watchpoint matches the access. */ > > + if (matching_access(wp_addr_masked, wp_size, addr_masked, size)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size, > > + bool is_write) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const long encoded_watchpoint = encode_watchpoint(addr, size, is_write); > > + atomic_long_t *watchpoint; > > + int i; > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + long expect_val = INVALID_WATCHPOINT; > > + > > + /* Try to acquire this slot. */ > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > If we do this SLOT_IDX trickery to catch unaligned accesses crossing > pages, then I think we should not use it insert_watchpoint at all and > only set the watchpoint to the exact index. Otherwise, we will > actually miss the corner cases which defeats the whole purpose of > SLOT_IDX and 3 iterations. > > > + if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val, > > + encoded_watchpoint)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +/* > > + * Return true if watchpoint was successfully consumed, false otherwise. > > + * > > + * This may return false if: > > + * > > + * 1. another thread already consumed the watchpoint; > > + * 2. the thread that set up the watchpoint already removed it; > > + * 3. the watchpoint was removed and then re-used. > > + */ > > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint, > > + long encoded_watchpoint) > > +{ > > + return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint, > > + CONSUMED_WATCHPOINT); > > +} > > + > > +/* > > + * Return true if watchpoint was not touched, false if consumed. > > + */ > > +static inline bool remove_watchpoint(atomic_long_t *watchpoint) > > +{ > > + return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) != > > + CONSUMED_WATCHPOINT; > > +} > > + > > +static inline struct kcsan_ctx *get_ctx(void) > > +{ > > + /* > > + * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would > > + * also result in calls that generate warnings in uaccess regions. > > + */ > > + return in_task() ? ¤t->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx); > > +} > > + > > +static inline bool is_atomic(const volatile void *ptr) > > +{ > > + struct kcsan_ctx *ctx = get_ctx(); > > + > > + if (unlikely(ctx->atomic_next > 0)) { > > + --ctx->atomic_next; > > + return true; > > + } > > + if (unlikely(ctx->atomic_nest_count > 0 || ctx->in_flat_atomic)) > > + return true; > > + > > + return kcsan_is_atomic(ptr); > > +} > > + > > +static inline bool should_watch(const volatile void *ptr, int type) > > +{ > > + /* > > + * Never set up watchpoints when memory operations are atomic. > > + * > > + * Need to check this first, before kcsan_skip check below: (1) atomics > > + * should not count towards skipped instructions, and (2) to actually > > + * decrement kcsan_atomic_next for consecutive instruction stream. > > + */ > > + if ((type & KCSAN_ACCESS_ATOMIC) != 0 || is_atomic(ptr)) > > + return false; > > should_watch and is_atomic are invoked on the fast path and do more > things than strictly necessary. > The minimal amount of actions would be: > - check and decrement ctx->atomic_next for atomic accesses > - decrement kcsan_skip > > atomic_nest_count/in_flat_atomic/kcsan_is_atomic can be checked on > uninlined slow path. > > It should not be necessary to set kcsan_skip to -1 if we _always_ > resetup kcsan_skip on slow path. > > > + if (this_cpu_dec_return(kcsan_skip) >= 0) > > + return false; > > + > > + /* avoid underflow if !kcsan_is_enabled() */ > > + this_cpu_write(kcsan_skip, -1); > > + > > + /* this operation should be watched */ > > + return true; > > +} > > + > > +static inline void reset_kcsan_skip(void) > > +{ > > + long skip_count = CONFIG_KCSAN_SKIP_WATCH - > > + (IS_ENABLED(CONFIG_KCSAN_SKIP_WATCH_RANDOMIZE) ? > > + prandom_u32_max(CONFIG_KCSAN_SKIP_WATCH) : > > + 0); > > + this_cpu_write(kcsan_skip, skip_count); > > +} > > + > > +static inline bool kcsan_is_enabled(void) > > +{ > > + return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0; > > +} > > + > > +static inline unsigned int get_delay(void) > > +{ > > + unsigned int delay = in_task() ? CONFIG_KCSAN_UDELAY_TASK : > > + CONFIG_KCSAN_UDELAY_INTERRUPT; > > + return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ? > > + prandom_u32_max(delay) : > > + 0); > > +} > > + > > +/* > > + * Pull everything together: check_access() below contains the performance > > + * critical operations; the fast-path (including check_access) functions should > > + * all be inlinable by the instrumentation functions. > > + * > > + * The slow-path (kcsan_found_watchpoint, kcsan_setup_watchpoint) are > > + * non-inlinable -- note that, we prefix these with "kcsan_" to ensure they can > > + * be filtered from the stacktrace, as well as give them unique names for the > > + * UACCESS whitelist of objtool. Each function uses user_access_save/restore(), > > + * since they do not access any user memory, but instrumentation is still > > + * emitted in UACCESS regions. > > + */ > > + > > +static noinline void kcsan_found_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write, > > + bool consumed) > > +{ > > + unsigned long flags = user_access_save(); > > + enum kcsan_report_type report_type; > > + > > + if (!consumed) { > > + /* > > + * The other thread may not print any diagnostics, as it has > > + * already removed the watchpoint, or another thread consumed > > + * the watchpoint before this thread. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_REPORT_RACES); > > + report_type = KCSAN_REPORT_RACE_CHECK_RACE; > > + } else { > > + report_type = KCSAN_REPORT_RACE_CHECK; > > + } > > + > > + kcsan_counter_inc(KCSAN_COUNTER_DATA_RACES); > > + kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type); > > + > > + user_access_restore(flags); > > +} > > + > > +static noinline void kcsan_setup_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write) > > +{ > > + atomic_long_t *watchpoint; > > + union { > > + u8 _1; > > + u16 _2; > > + u32 _4; > > + u64 _8; > > + } expect_value; > > + bool is_expected = true; > > + unsigned long ua_flags = user_access_save(); > > + unsigned long irq_flags; > > + > > + if (!check_encodable((unsigned long)ptr, size)) { > > + kcsan_counter_inc(KCSAN_COUNTER_UNENCODABLE_ACCESSES); > > + goto out; > > + } > > + > > + /* > > + * Disable interrupts & preemptions to avoid another thread on the same > > + * CPU accessing memory locations for the set up watchpoint; this is to > > + * avoid reporting races to e.g. CPU-local data. > > + * > > + * An alternative would be adding the source CPU to the watchpoint > > + * encoding, and checking that watchpoint-CPU != this-CPU. There are > > + * several problems with this: > > + * 1. we should avoid stealing more bits from the watchpoint encoding > > + * as it would affect accuracy, as well as increase performance > > + * overhead in the fast-path; > > + * 2. if we are preempted, but there *is* a genuine data race, we > > + * would *not* report it -- since this is the common case (vs. > > + * CPU-local data accesses), it makes more sense (from a data race > > + * detection point of view) to simply disable preemptions to ensure > > + * as many tasks as possible run on other CPUs. > > + */ > > + local_irq_save(irq_flags); > > + > > + watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write); > > + if (watchpoint == NULL) { > > + /* > > + * Out of capacity: the size of `watchpoints`, and the frequency > > + * with which `should_watch()` returns true should be tweaked so > > + * that this case happens very rarely. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_NO_CAPACITY); > > + goto out_unlock; > > + } > > + > > + /* > > + * Reset kcsan_skip counter: only do this if we succeeded in setting up > > + * a watchpoint. > > + */ > > + reset_kcsan_skip(); > > + > > + kcsan_counter_inc(KCSAN_COUNTER_SETUP_WATCHPOINTS); > > + kcsan_counter_inc(KCSAN_COUNTER_USED_WATCHPOINTS); > > + > > + /* > > + * Read the current value, to later check and infer a race if the data > > + * was modified via a non-instrumented access, e.g. from a device. > > + */ > > + switch (size) { > > + case 1: > > + expect_value._1 = READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + expect_value._2 = READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + expect_value._4 = READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + expect_value._8 = READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) { > > + kcsan_disable_current(); > > + pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n", > > + is_write ? "write" : "read", size, ptr, > > + watchpoint_slot((unsigned long)ptr), > > + encode_watchpoint((unsigned long)ptr, size, is_write)); > > + kcsan_enable_current(); > > + } > > + > > + /* > > + * Delay this thread, to increase probability of observing a racy > > + * conflicting access. > > + */ > > + udelay(get_delay()); > > + > > + /* > > + * Re-read value, and check if it is as expected; if not, we infer a > > + * racy access. > > + */ > > + switch (size) { > > + case 1: > > + is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + /* Check if this access raced with another. */ > > + if (!remove_watchpoint(watchpoint)) { > > + /* > > + * No need to increment 'data_races' counter, as the racing > > + * thread already did. > > + */ > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_SETUP); > > + } else if (!is_expected) { > > + /* Inferring a race, since the value should not have changed. */ > > + kcsan_counter_inc(KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN); > > + if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN)) > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_UNKNOWN_ORIGIN); > > + } > > + > > + kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS); > > +out_unlock: > > + local_irq_restore(irq_flags); > > +out: > > + user_access_restore(ua_flags); > > +} > > + > > +static inline void check_access(const volatile void *ptr, size_t size, int type) > > +{ > > + const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0; > > + atomic_long_t *watchpoint; > > + long encoded_watchpoint; > > + > > + if (IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE) && is_write) > > + type |= KCSAN_ACCESS_ATOMIC; > > + > > + /* > > + * Avoid user_access_save in fast-path: find_watchpoint is safe without > > + * user_access_save, as the address that ptr points to is only used to > > + * check if a watchpoint exists; ptr is never dereferenced. > > + */ > > + watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write, > > + &encoded_watchpoint); > > + > > + /* > > + * It is safe to check kcsan_is_enabled() after find_watchpoint, but > > + * right before we would enter the slow-path: no state changes that > > + * cause a data race to be detected and reported have occurred yet. > > + */ > > + > > + if (unlikely(watchpoint != NULL) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled and the rest of the code in the branch > into non-inlined slow path. > It makes the hot function much shorter. > There is a trick related to number of arguments, though. We would need > to pass ptr, size, is_write, watchpoint and encoded_watchpoint. That's > 5 arguments. Only 4 are passed in registers. So it may make sense to > combine size and type into a single word. On the inlined fast path > compiler packs/unpacks that statically, so it does not matter. But for > the function call it will just forward a single const. > > > > + /* > > + * Try consume the watchpoint as soon after finding the > > + * watchpoint as possible; this must always be guarded by > > + * kcsan_is_enabled() check, as otherwise we might erroneously > > + * triggering reports when disabled. > > + */ > > + const bool consumed = > > + try_consume_watchpoint(watchpoint, encoded_watchpoint); > > + > > + kcsan_found_watchpoint(ptr, size, is_write, consumed); > > + } else if (unlikely(should_watch(ptr, type)) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled check into kcsan_setup_watchpoint. It's > not executed on fast path, but bloats the host function code. > > > + kcsan_setup_watchpoint(ptr, size, is_write); > > + } > > +} Thanks! I will work on these. My only worry is that some of these might overcomplicate the code with no measurable gain. I will try to find a balance here. Will send v4 then. Thanks, -- Marco From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marco Elver Subject: Re: [PATCH v3 1/9] kcsan: Add Kernel Concurrency Sanitizer infrastructure Date: Wed, 6 Nov 2019 11:03:14 +0100 Message-ID: References: <20191104142745.14722-1-elver@google.com> <20191104142745.14722-2-elver@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Dmitry Vyukov Cc: LKMM Maintainers -- Akira Yokosawa , Alan Stern , Alexander Potapenko , Andrea Parri , Andrey Konovalov , Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Boqun Feng , Borislav Petkov , Daniel Axtens , Daniel Lustig , Dave Hansen , David Howells , "H. Peter Anvin" , Ingo Molnar , Jade Alglave , Joel Fernandes , Jonathan Corbet , Josh Poimboeuf List-Id: linux-arch.vger.kernel.org Hi Dmitry, On Wed, 6 Nov 2019 at 10:38, Dmitry Vyukov wrote: > > On Mon, Nov 4, 2019 at 3:28 PM Marco Elver wrote: > > > > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for > > kernel space. KCSAN is a sampling watchpoint-based data-race detector. > > See the included Documentation/dev-tools/kcsan.rst for more details. > ... > > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size, > > + bool expect_write, > > + long *encoded_watchpoint) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK; > > + atomic_long_t *watchpoint; > > + unsigned long wp_addr_masked; > > + size_t wp_size; > > + bool is_write; > > + int i; > > + > > + BUILD_BUG_ON(CONFIG_KCSAN_NUM_WATCHPOINTS < CHECK_NUM_SLOTS); > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > > The fast path code become much nicer! > I did another pass looking at how we can optimize the fast path. > Currently we still have 2 push/pop pairs on the fast path because of > register pressure. The logic in SLOT_IDX seems to be the main culprit. > We discussed several options offline: > 1. Just check 1 slot and ignore all corner cases (we will miss racing > unaligned access to different addresses but overlapping and crossing > pages, which sounds pretty esoteric) > 2. Check 3 slots in order and without wraparound (watchpoints[slot + > i], where i=-1,0,1), this will require adding dummy slots around the > array > 3. An interesting option is to check just 2 slots (that's enough!), to > make this work we will need to slightly offset bucket number when > setting a watch point (namely, if an access goes to the very end of a > page, we set the watchpoint into the bucket corresponding to the > _next_ page) > All of these options remove push/pop in my experiments. Obviously > checking fewer slots will reduce dynamic overhead even more. > > > > + *encoded_watchpoint = atomic_long_read(watchpoint); > > + if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked, > > + &wp_size, &is_write)) > > + continue; > > + > > + if (expect_write && !is_write) > > + continue; > > + > > + /* Check if the watchpoint matches the access. */ > > + if (matching_access(wp_addr_masked, wp_size, addr_masked, size)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size, > > + bool is_write) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const long encoded_watchpoint = encode_watchpoint(addr, size, is_write); > > + atomic_long_t *watchpoint; > > + int i; > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + long expect_val = INVALID_WATCHPOINT; > > + > > + /* Try to acquire this slot. */ > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > If we do this SLOT_IDX trickery to catch unaligned accesses crossing > pages, then I think we should not use it insert_watchpoint at all and > only set the watchpoint to the exact index. Otherwise, we will > actually miss the corner cases which defeats the whole purpose of > SLOT_IDX and 3 iterations. > > > + if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val, > > + encoded_watchpoint)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +/* > > + * Return true if watchpoint was successfully consumed, false otherwise. > > + * > > + * This may return false if: > > + * > > + * 1. another thread already consumed the watchpoint; > > + * 2. the thread that set up the watchpoint already removed it; > > + * 3. the watchpoint was removed and then re-used. > > + */ > > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint, > > + long encoded_watchpoint) > > +{ > > + return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint, > > + CONSUMED_WATCHPOINT); > > +} > > + > > +/* > > + * Return true if watchpoint was not touched, false if consumed. > > + */ > > +static inline bool remove_watchpoint(atomic_long_t *watchpoint) > > +{ > > + return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) != > > + CONSUMED_WATCHPOINT; > > +} > > + > > +static inline struct kcsan_ctx *get_ctx(void) > > +{ > > + /* > > + * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would > > + * also result in calls that generate warnings in uaccess regions. > > + */ > > + return in_task() ? ¤t->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx); > > +} > > + > > +static inline bool is_atomic(const volatile void *ptr) > > +{ > > + struct kcsan_ctx *ctx = get_ctx(); > > + > > + if (unlikely(ctx->atomic_next > 0)) { > > + --ctx->atomic_next; > > + return true; > > + } > > + if (unlikely(ctx->atomic_nest_count > 0 || ctx->in_flat_atomic)) > > + return true; > > + > > + return kcsan_is_atomic(ptr); > > +} > > + > > +static inline bool should_watch(const volatile void *ptr, int type) > > +{ > > + /* > > + * Never set up watchpoints when memory operations are atomic. > > + * > > + * Need to check this first, before kcsan_skip check below: (1) atomics > > + * should not count towards skipped instructions, and (2) to actually > > + * decrement kcsan_atomic_next for consecutive instruction stream. > > + */ > > + if ((type & KCSAN_ACCESS_ATOMIC) != 0 || is_atomic(ptr)) > > + return false; > > should_watch and is_atomic are invoked on the fast path and do more > things than strictly necessary. > The minimal amount of actions would be: > - check and decrement ctx->atomic_next for atomic accesses > - decrement kcsan_skip > > atomic_nest_count/in_flat_atomic/kcsan_is_atomic can be checked on > uninlined slow path. > > It should not be necessary to set kcsan_skip to -1 if we _always_ > resetup kcsan_skip on slow path. > > > + if (this_cpu_dec_return(kcsan_skip) >= 0) > > + return false; > > + > > + /* avoid underflow if !kcsan_is_enabled() */ > > + this_cpu_write(kcsan_skip, -1); > > + > > + /* this operation should be watched */ > > + return true; > > +} > > + > > +static inline void reset_kcsan_skip(void) > > +{ > > + long skip_count = CONFIG_KCSAN_SKIP_WATCH - > > + (IS_ENABLED(CONFIG_KCSAN_SKIP_WATCH_RANDOMIZE) ? > > + prandom_u32_max(CONFIG_KCSAN_SKIP_WATCH) : > > + 0); > > + this_cpu_write(kcsan_skip, skip_count); > > +} > > + > > +static inline bool kcsan_is_enabled(void) > > +{ > > + return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0; > > +} > > + > > +static inline unsigned int get_delay(void) > > +{ > > + unsigned int delay = in_task() ? CONFIG_KCSAN_UDELAY_TASK : > > + CONFIG_KCSAN_UDELAY_INTERRUPT; > > + return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ? > > + prandom_u32_max(delay) : > > + 0); > > +} > > + > > +/* > > + * Pull everything together: check_access() below contains the performance > > + * critical operations; the fast-path (including check_access) functions should > > + * all be inlinable by the instrumentation functions. > > + * > > + * The slow-path (kcsan_found_watchpoint, kcsan_setup_watchpoint) are > > + * non-inlinable -- note that, we prefix these with "kcsan_" to ensure they can > > + * be filtered from the stacktrace, as well as give them unique names for the > > + * UACCESS whitelist of objtool. Each function uses user_access_save/restore(), > > + * since they do not access any user memory, but instrumentation is still > > + * emitted in UACCESS regions. > > + */ > > + > > +static noinline void kcsan_found_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write, > > + bool consumed) > > +{ > > + unsigned long flags = user_access_save(); > > + enum kcsan_report_type report_type; > > + > > + if (!consumed) { > > + /* > > + * The other thread may not print any diagnostics, as it has > > + * already removed the watchpoint, or another thread consumed > > + * the watchpoint before this thread. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_REPORT_RACES); > > + report_type = KCSAN_REPORT_RACE_CHECK_RACE; > > + } else { > > + report_type = KCSAN_REPORT_RACE_CHECK; > > + } > > + > > + kcsan_counter_inc(KCSAN_COUNTER_DATA_RACES); > > + kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type); > > + > > + user_access_restore(flags); > > +} > > + > > +static noinline void kcsan_setup_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write) > > +{ > > + atomic_long_t *watchpoint; > > + union { > > + u8 _1; > > + u16 _2; > > + u32 _4; > > + u64 _8; > > + } expect_value; > > + bool is_expected = true; > > + unsigned long ua_flags = user_access_save(); > > + unsigned long irq_flags; > > + > > + if (!check_encodable((unsigned long)ptr, size)) { > > + kcsan_counter_inc(KCSAN_COUNTER_UNENCODABLE_ACCESSES); > > + goto out; > > + } > > + > > + /* > > + * Disable interrupts & preemptions to avoid another thread on the same > > + * CPU accessing memory locations for the set up watchpoint; this is to > > + * avoid reporting races to e.g. CPU-local data. > > + * > > + * An alternative would be adding the source CPU to the watchpoint > > + * encoding, and checking that watchpoint-CPU != this-CPU. There are > > + * several problems with this: > > + * 1. we should avoid stealing more bits from the watchpoint encoding > > + * as it would affect accuracy, as well as increase performance > > + * overhead in the fast-path; > > + * 2. if we are preempted, but there *is* a genuine data race, we > > + * would *not* report it -- since this is the common case (vs. > > + * CPU-local data accesses), it makes more sense (from a data race > > + * detection point of view) to simply disable preemptions to ensure > > + * as many tasks as possible run on other CPUs. > > + */ > > + local_irq_save(irq_flags); > > + > > + watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write); > > + if (watchpoint == NULL) { > > + /* > > + * Out of capacity: the size of `watchpoints`, and the frequency > > + * with which `should_watch()` returns true should be tweaked so > > + * that this case happens very rarely. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_NO_CAPACITY); > > + goto out_unlock; > > + } > > + > > + /* > > + * Reset kcsan_skip counter: only do this if we succeeded in setting up > > + * a watchpoint. > > + */ > > + reset_kcsan_skip(); > > + > > + kcsan_counter_inc(KCSAN_COUNTER_SETUP_WATCHPOINTS); > > + kcsan_counter_inc(KCSAN_COUNTER_USED_WATCHPOINTS); > > + > > + /* > > + * Read the current value, to later check and infer a race if the data > > + * was modified via a non-instrumented access, e.g. from a device. > > + */ > > + switch (size) { > > + case 1: > > + expect_value._1 = READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + expect_value._2 = READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + expect_value._4 = READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + expect_value._8 = READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) { > > + kcsan_disable_current(); > > + pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n", > > + is_write ? "write" : "read", size, ptr, > > + watchpoint_slot((unsigned long)ptr), > > + encode_watchpoint((unsigned long)ptr, size, is_write)); > > + kcsan_enable_current(); > > + } > > + > > + /* > > + * Delay this thread, to increase probability of observing a racy > > + * conflicting access. > > + */ > > + udelay(get_delay()); > > + > > + /* > > + * Re-read value, and check if it is as expected; if not, we infer a > > + * racy access. > > + */ > > + switch (size) { > > + case 1: > > + is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + /* Check if this access raced with another. */ > > + if (!remove_watchpoint(watchpoint)) { > > + /* > > + * No need to increment 'data_races' counter, as the racing > > + * thread already did. > > + */ > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_SETUP); > > + } else if (!is_expected) { > > + /* Inferring a race, since the value should not have changed. */ > > + kcsan_counter_inc(KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN); > > + if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN)) > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_UNKNOWN_ORIGIN); > > + } > > + > > + kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS); > > +out_unlock: > > + local_irq_restore(irq_flags); > > +out: > > + user_access_restore(ua_flags); > > +} > > + > > +static inline void check_access(const volatile void *ptr, size_t size, int type) > > +{ > > + const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0; > > + atomic_long_t *watchpoint; > > + long encoded_watchpoint; > > + > > + if (IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE) && is_write) > > + type |= KCSAN_ACCESS_ATOMIC; > > + > > + /* > > + * Avoid user_access_save in fast-path: find_watchpoint is safe without > > + * user_access_save, as the address that ptr points to is only used to > > + * check if a watchpoint exists; ptr is never dereferenced. > > + */ > > + watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write, > > + &encoded_watchpoint); > > + > > + /* > > + * It is safe to check kcsan_is_enabled() after find_watchpoint, but > > + * right before we would enter the slow-path: no state changes that > > + * cause a data race to be detected and reported have occurred yet. > > + */ > > + > > + if (unlikely(watchpoint != NULL) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled and the rest of the code in the branch > into non-inlined slow path. > It makes the hot function much shorter. > There is a trick related to number of arguments, though. We would need > to pass ptr, size, is_write, watchpoint and encoded_watchpoint. That's > 5 arguments. Only 4 are passed in registers. So it may make sense to > combine size and type into a single word. On the inlined fast path > compiler packs/unpacks that statically, so it does not matter. But for > the function call it will just forward a single const. > > > > + /* > > + * Try consume the watchpoint as soon after finding the > > + * watchpoint as possible; this must always be guarded by > > + * kcsan_is_enabled() check, as otherwise we might erroneously > > + * triggering reports when disabled. > > + */ > > + const bool consumed = > > + try_consume_watchpoint(watchpoint, encoded_watchpoint); > > + > > + kcsan_found_watchpoint(ptr, size, is_write, consumed); > > + } else if (unlikely(should_watch(ptr, type)) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled check into kcsan_setup_watchpoint. It's > not executed on fast path, but bloats the host function code. > > > + kcsan_setup_watchpoint(ptr, size, is_write); > > + } > > +} Thanks! I will work on these. My only worry is that some of these might overcomplicate the code with no measurable gain. I will try to find a balance here. Will send v4 then. Thanks, -- Marco From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A6C4C47E49 for ; Wed, 6 Nov 2019 10:03:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E223A2173B for ; Wed, 6 Nov 2019 10:03:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="e9psByQK" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E223A2173B Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 83D8D6B0003; Wed, 6 Nov 2019 05:03:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EEC86B0006; Wed, 6 Nov 2019 05:03:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DE406B0007; Wed, 6 Nov 2019 05:03:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id 527CF6B0003 for ; Wed, 6 Nov 2019 05:03:30 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 0EE611EF3 for ; Wed, 6 Nov 2019 10:03:30 +0000 (UTC) X-FDA: 76125415380.20.dime42_58a2dfafc0124 X-HE-Tag: dime42_58a2dfafc0124 X-Filterd-Recvd-Size: 22795 Received: from mail-oi1-f196.google.com (mail-oi1-f196.google.com [209.85.167.196]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Nov 2019 10:03:28 +0000 (UTC) Received: by mail-oi1-f196.google.com with SMTP id l202so20437731oig.1 for ; Wed, 06 Nov 2019 02:03:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oecBSvIUrZfgGbfDkFaP0N9DR+5D4Lv7CuoNOjwV1n8=; b=e9psByQKWX7p/kTzMon5mlLZAxPsBNa71ePYvzcE0u/L6XvINm5TSja9Pk0tzFDm+1 0I2RfDYHAAgxQMaIRLEPy6osSYJLwlCZ/CWIihf5r0XOBrrmNfU0uQFtXRAngZINIPTG oaXUtuVgdxjm5uSTfAfvOwrqaidY+u2m46DBJLDftvoRMsmKSXjlC1+qVZbQnZ3DKRP3 b320e8p4DyLOPWYYcQnCjGZe8Jsix6KfC6c9R7CP3/rQQyFyYS0sXGCTh5t6dpn+cNT3 4lwRtx+vczrmzNDGTlPKQkspcmNEvXmPigkbjkVPQnQjXbZ+NwJ0cyB0yuIiLmnyOnaW gcSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oecBSvIUrZfgGbfDkFaP0N9DR+5D4Lv7CuoNOjwV1n8=; b=Wd2Fe0KFNLbgUkwL904p8GaNFj03V1EZpTiCY0XEo426dPWeUbCpPfyvx1DzbrNxph R8pdobkxajiJCHD2YlUuOQyrtBo2fr59wxRKotJouZ7hAfKh0VbQavipz8x1dE/KdNr2 cI/sB/8oMvToVvYf5nCZeIrDNiSN89osAm9J5VnT8do85tH/kmK87p2vbjmDNReOXqaQ RybFOH+qnGrDlrnSxvfQAG8M8kr8AhAk6nDGVWzJ7OSzeKgLu0Xy+Wyl9jfp4ZnaLfqR fqYIolhnb9bIZIj7SyBg/ZKYIwA9eeWy0n3ur5yqveFmQRqu6//GBOE/HTpidg4BuDkm hsTQ== X-Gm-Message-State: APjAAAWpdRn2IjumITQ9DKqn4tJFoAUgmCl5dWx7Mj6qITVN0s5GMNLF 2Jmwx+PkKTDXOqRWZxueR+rJ1os0rG8HA0CQtcSApA== X-Google-Smtp-Source: APXvYqzlo3EP0QaBuW/FJixBYXJCpW9Bg7NtROEx2s/7504Ha4MenBQ0K9KDNNDokx1RK7wTjroNDXe9tGZ+mnBpwc4= X-Received: by 2002:aca:4dcc:: with SMTP id a195mr1496135oib.172.1573034606430; Wed, 06 Nov 2019 02:03:26 -0800 (PST) MIME-Version: 1.0 References: <20191104142745.14722-1-elver@google.com> <20191104142745.14722-2-elver@google.com> In-Reply-To: From: Marco Elver Date: Wed, 6 Nov 2019 11:03:14 +0100 Message-ID: Subject: Re: [PATCH v3 1/9] kcsan: Add Kernel Concurrency Sanitizer infrastructure To: Dmitry Vyukov Cc: LKMM Maintainers -- Akira Yokosawa , Alan Stern , Alexander Potapenko , Andrea Parri , Andrey Konovalov , Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Boqun Feng , Borislav Petkov , Daniel Axtens , Daniel Lustig , Dave Hansen , David Howells , "H. Peter Anvin" , Ingo Molnar , Jade Alglave , Joel Fernandes , Jonathan Corbet , Josh Poimboeuf , Luc Maranget , Mark Rutland , Nicholas Piggin , "Paul E. McKenney" , Peter Zijlstra , Thomas Gleixner , Will Deacon , kasan-dev , linux-arch , "open list:DOCUMENTATION" , linux-efi@vger.kernel.org, "open list:KERNEL BUILD + fi..." , LKML , Linux-MM , "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Dmitry, On Wed, 6 Nov 2019 at 10:38, Dmitry Vyukov wrote: > > On Mon, Nov 4, 2019 at 3:28 PM Marco Elver wrote: > > > > Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for > > kernel space. KCSAN is a sampling watchpoint-based data-race detector. > > See the included Documentation/dev-tools/kcsan.rst for more details. > ... > > +static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size, > > + bool expect_write, > > + long *encoded_watchpoint) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK; > > + atomic_long_t *watchpoint; > > + unsigned long wp_addr_masked; > > + size_t wp_size; > > + bool is_write; > > + int i; > > + > > + BUILD_BUG_ON(CONFIG_KCSAN_NUM_WATCHPOINTS < CHECK_NUM_SLOTS); > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > > The fast path code become much nicer! > I did another pass looking at how we can optimize the fast path. > Currently we still have 2 push/pop pairs on the fast path because of > register pressure. The logic in SLOT_IDX seems to be the main culprit. > We discussed several options offline: > 1. Just check 1 slot and ignore all corner cases (we will miss racing > unaligned access to different addresses but overlapping and crossing > pages, which sounds pretty esoteric) > 2. Check 3 slots in order and without wraparound (watchpoints[slot + > i], where i=-1,0,1), this will require adding dummy slots around the > array > 3. An interesting option is to check just 2 slots (that's enough!), to > make this work we will need to slightly offset bucket number when > setting a watch point (namely, if an access goes to the very end of a > page, we set the watchpoint into the bucket corresponding to the > _next_ page) > All of these options remove push/pop in my experiments. Obviously > checking fewer slots will reduce dynamic overhead even more. > > > > + *encoded_watchpoint = atomic_long_read(watchpoint); > > + if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked, > > + &wp_size, &is_write)) > > + continue; > > + > > + if (expect_write && !is_write) > > + continue; > > + > > + /* Check if the watchpoint matches the access. */ > > + if (matching_access(wp_addr_masked, wp_size, addr_masked, size)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size, > > + bool is_write) > > +{ > > + const int slot = watchpoint_slot(addr); > > + const long encoded_watchpoint = encode_watchpoint(addr, size, is_write); > > + atomic_long_t *watchpoint; > > + int i; > > + > > + for (i = 0; i < CHECK_NUM_SLOTS; ++i) { > > + long expect_val = INVALID_WATCHPOINT; > > + > > + /* Try to acquire this slot. */ > > + watchpoint = &watchpoints[SLOT_IDX(slot, i)]; > > If we do this SLOT_IDX trickery to catch unaligned accesses crossing > pages, then I think we should not use it insert_watchpoint at all and > only set the watchpoint to the exact index. Otherwise, we will > actually miss the corner cases which defeats the whole purpose of > SLOT_IDX and 3 iterations. > > > + if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val, > > + encoded_watchpoint)) > > + return watchpoint; > > + } > > + > > + return NULL; > > +} > > + > > +/* > > + * Return true if watchpoint was successfully consumed, false otherwise. > > + * > > + * This may return false if: > > + * > > + * 1. another thread already consumed the watchpoint; > > + * 2. the thread that set up the watchpoint already removed it; > > + * 3. the watchpoint was removed and then re-used. > > + */ > > +static inline bool try_consume_watchpoint(atomic_long_t *watchpoint, > > + long encoded_watchpoint) > > +{ > > + return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint, > > + CONSUMED_WATCHPOINT); > > +} > > + > > +/* > > + * Return true if watchpoint was not touched, false if consumed. > > + */ > > +static inline bool remove_watchpoint(atomic_long_t *watchpoint) > > +{ > > + return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) != > > + CONSUMED_WATCHPOINT; > > +} > > + > > +static inline struct kcsan_ctx *get_ctx(void) > > +{ > > + /* > > + * In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would > > + * also result in calls that generate warnings in uaccess regions. > > + */ > > + return in_task() ? ¤t->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx); > > +} > > + > > +static inline bool is_atomic(const volatile void *ptr) > > +{ > > + struct kcsan_ctx *ctx = get_ctx(); > > + > > + if (unlikely(ctx->atomic_next > 0)) { > > + --ctx->atomic_next; > > + return true; > > + } > > + if (unlikely(ctx->atomic_nest_count > 0 || ctx->in_flat_atomic)) > > + return true; > > + > > + return kcsan_is_atomic(ptr); > > +} > > + > > +static inline bool should_watch(const volatile void *ptr, int type) > > +{ > > + /* > > + * Never set up watchpoints when memory operations are atomic. > > + * > > + * Need to check this first, before kcsan_skip check below: (1) atomics > > + * should not count towards skipped instructions, and (2) to actually > > + * decrement kcsan_atomic_next for consecutive instruction stream. > > + */ > > + if ((type & KCSAN_ACCESS_ATOMIC) != 0 || is_atomic(ptr)) > > + return false; > > should_watch and is_atomic are invoked on the fast path and do more > things than strictly necessary. > The minimal amount of actions would be: > - check and decrement ctx->atomic_next for atomic accesses > - decrement kcsan_skip > > atomic_nest_count/in_flat_atomic/kcsan_is_atomic can be checked on > uninlined slow path. > > It should not be necessary to set kcsan_skip to -1 if we _always_ > resetup kcsan_skip on slow path. > > > + if (this_cpu_dec_return(kcsan_skip) >= 0) > > + return false; > > + > > + /* avoid underflow if !kcsan_is_enabled() */ > > + this_cpu_write(kcsan_skip, -1); > > + > > + /* this operation should be watched */ > > + return true; > > +} > > + > > +static inline void reset_kcsan_skip(void) > > +{ > > + long skip_count = CONFIG_KCSAN_SKIP_WATCH - > > + (IS_ENABLED(CONFIG_KCSAN_SKIP_WATCH_RANDOMIZE) ? > > + prandom_u32_max(CONFIG_KCSAN_SKIP_WATCH) : > > + 0); > > + this_cpu_write(kcsan_skip, skip_count); > > +} > > + > > +static inline bool kcsan_is_enabled(void) > > +{ > > + return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0; > > +} > > + > > +static inline unsigned int get_delay(void) > > +{ > > + unsigned int delay = in_task() ? CONFIG_KCSAN_UDELAY_TASK : > > + CONFIG_KCSAN_UDELAY_INTERRUPT; > > + return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ? > > + prandom_u32_max(delay) : > > + 0); > > +} > > + > > +/* > > + * Pull everything together: check_access() below contains the performance > > + * critical operations; the fast-path (including check_access) functions should > > + * all be inlinable by the instrumentation functions. > > + * > > + * The slow-path (kcsan_found_watchpoint, kcsan_setup_watchpoint) are > > + * non-inlinable -- note that, we prefix these with "kcsan_" to ensure they can > > + * be filtered from the stacktrace, as well as give them unique names for the > > + * UACCESS whitelist of objtool. Each function uses user_access_save/restore(), > > + * since they do not access any user memory, but instrumentation is still > > + * emitted in UACCESS regions. > > + */ > > + > > +static noinline void kcsan_found_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write, > > + bool consumed) > > +{ > > + unsigned long flags = user_access_save(); > > + enum kcsan_report_type report_type; > > + > > + if (!consumed) { > > + /* > > + * The other thread may not print any diagnostics, as it has > > + * already removed the watchpoint, or another thread consumed > > + * the watchpoint before this thread. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_REPORT_RACES); > > + report_type = KCSAN_REPORT_RACE_CHECK_RACE; > > + } else { > > + report_type = KCSAN_REPORT_RACE_CHECK; > > + } > > + > > + kcsan_counter_inc(KCSAN_COUNTER_DATA_RACES); > > + kcsan_report(ptr, size, is_write, raw_smp_processor_id(), report_type); > > + > > + user_access_restore(flags); > > +} > > + > > +static noinline void kcsan_setup_watchpoint(const volatile void *ptr, > > + size_t size, bool is_write) > > +{ > > + atomic_long_t *watchpoint; > > + union { > > + u8 _1; > > + u16 _2; > > + u32 _4; > > + u64 _8; > > + } expect_value; > > + bool is_expected = true; > > + unsigned long ua_flags = user_access_save(); > > + unsigned long irq_flags; > > + > > + if (!check_encodable((unsigned long)ptr, size)) { > > + kcsan_counter_inc(KCSAN_COUNTER_UNENCODABLE_ACCESSES); > > + goto out; > > + } > > + > > + /* > > + * Disable interrupts & preemptions to avoid another thread on the same > > + * CPU accessing memory locations for the set up watchpoint; this is to > > + * avoid reporting races to e.g. CPU-local data. > > + * > > + * An alternative would be adding the source CPU to the watchpoint > > + * encoding, and checking that watchpoint-CPU != this-CPU. There are > > + * several problems with this: > > + * 1. we should avoid stealing more bits from the watchpoint encoding > > + * as it would affect accuracy, as well as increase performance > > + * overhead in the fast-path; > > + * 2. if we are preempted, but there *is* a genuine data race, we > > + * would *not* report it -- since this is the common case (vs. > > + * CPU-local data accesses), it makes more sense (from a data race > > + * detection point of view) to simply disable preemptions to ensure > > + * as many tasks as possible run on other CPUs. > > + */ > > + local_irq_save(irq_flags); > > + > > + watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write); > > + if (watchpoint == NULL) { > > + /* > > + * Out of capacity: the size of `watchpoints`, and the frequency > > + * with which `should_watch()` returns true should be tweaked so > > + * that this case happens very rarely. > > + */ > > + kcsan_counter_inc(KCSAN_COUNTER_NO_CAPACITY); > > + goto out_unlock; > > + } > > + > > + /* > > + * Reset kcsan_skip counter: only do this if we succeeded in setting up > > + * a watchpoint. > > + */ > > + reset_kcsan_skip(); > > + > > + kcsan_counter_inc(KCSAN_COUNTER_SETUP_WATCHPOINTS); > > + kcsan_counter_inc(KCSAN_COUNTER_USED_WATCHPOINTS); > > + > > + /* > > + * Read the current value, to later check and infer a race if the data > > + * was modified via a non-instrumented access, e.g. from a device. > > + */ > > + switch (size) { > > + case 1: > > + expect_value._1 = READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + expect_value._2 = READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + expect_value._4 = READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + expect_value._8 = READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) { > > + kcsan_disable_current(); > > + pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n", > > + is_write ? "write" : "read", size, ptr, > > + watchpoint_slot((unsigned long)ptr), > > + encode_watchpoint((unsigned long)ptr, size, is_write)); > > + kcsan_enable_current(); > > + } > > + > > + /* > > + * Delay this thread, to increase probability of observing a racy > > + * conflicting access. > > + */ > > + udelay(get_delay()); > > + > > + /* > > + * Re-read value, and check if it is as expected; if not, we infer a > > + * racy access. > > + */ > > + switch (size) { > > + case 1: > > + is_expected = expect_value._1 == READ_ONCE(*(const u8 *)ptr); > > + break; > > + case 2: > > + is_expected = expect_value._2 == READ_ONCE(*(const u16 *)ptr); > > + break; > > + case 4: > > + is_expected = expect_value._4 == READ_ONCE(*(const u32 *)ptr); > > + break; > > + case 8: > > + is_expected = expect_value._8 == READ_ONCE(*(const u64 *)ptr); > > + break; > > + default: > > + break; /* ignore; we do not diff the values */ > > + } > > + > > + /* Check if this access raced with another. */ > > + if (!remove_watchpoint(watchpoint)) { > > + /* > > + * No need to increment 'data_races' counter, as the racing > > + * thread already did. > > + */ > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_SETUP); > > + } else if (!is_expected) { > > + /* Inferring a race, since the value should not have changed. */ > > + kcsan_counter_inc(KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN); > > + if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN)) > > + kcsan_report(ptr, size, is_write, smp_processor_id(), > > + KCSAN_REPORT_RACE_UNKNOWN_ORIGIN); > > + } > > + > > + kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS); > > +out_unlock: > > + local_irq_restore(irq_flags); > > +out: > > + user_access_restore(ua_flags); > > +} > > + > > +static inline void check_access(const volatile void *ptr, size_t size, int type) > > +{ > > + const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0; > > + atomic_long_t *watchpoint; > > + long encoded_watchpoint; > > + > > + if (IS_ENABLED(CONFIG_KCSAN_PLAIN_WRITE_PRETEND_ONCE) && is_write) > > + type |= KCSAN_ACCESS_ATOMIC; > > + > > + /* > > + * Avoid user_access_save in fast-path: find_watchpoint is safe without > > + * user_access_save, as the address that ptr points to is only used to > > + * check if a watchpoint exists; ptr is never dereferenced. > > + */ > > + watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write, > > + &encoded_watchpoint); > > + > > + /* > > + * It is safe to check kcsan_is_enabled() after find_watchpoint, but > > + * right before we would enter the slow-path: no state changes that > > + * cause a data race to be detected and reported have occurred yet. > > + */ > > + > > + if (unlikely(watchpoint != NULL) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled and the rest of the code in the branch > into non-inlined slow path. > It makes the hot function much shorter. > There is a trick related to number of arguments, though. We would need > to pass ptr, size, is_write, watchpoint and encoded_watchpoint. That's > 5 arguments. Only 4 are passed in registers. So it may make sense to > combine size and type into a single word. On the inlined fast path > compiler packs/unpacks that statically, so it does not matter. But for > the function call it will just forward a single const. > > > > + /* > > + * Try consume the watchpoint as soon after finding the > > + * watchpoint as possible; this must always be guarded by > > + * kcsan_is_enabled() check, as otherwise we might erroneously > > + * triggering reports when disabled. > > + */ > > + const bool consumed = > > + try_consume_watchpoint(watchpoint, encoded_watchpoint); > > + > > + kcsan_found_watchpoint(ptr, size, is_write, consumed); > > + } else if (unlikely(should_watch(ptr, type)) && kcsan_is_enabled()) { > > I would move kcsan_is_enabled check into kcsan_setup_watchpoint. It's > not executed on fast path, but bloats the host function code. > > > + kcsan_setup_watchpoint(ptr, size, is_write); > > + } > > +} Thanks! I will work on these. My only worry is that some of these might overcomplicate the code with no measurable gain. I will try to find a balance here. Will send v4 then. Thanks, -- Marco