From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-qk0-f194.google.com ([209.85.220.194]:42814 "EHLO
        mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751805AbeERS2y (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 18 May 2018 14:28:54 -0400
Date: Fri, 18 May 2018 14:28:49 -0400
From: Kent Overstreet <kent.overstreet@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
        Andrew Morton <akpm@linux-foundation.org>,
        Dave Chinner <dchinner@redhat.com>, darrick.wong@oracle.com,
        tytso@mit.edu, linux-btrfs@vger.kernel.org, clm@fb.com,
        jbacik@fb.com, viro@zeniv.linux.org.uk, willy@infradead.org,
        peterz@infradead.org
Subject: Re: [PATCH 00/10] RFC: assorted bcachefs patches
Message-ID: <20180518182849.GF31737@kmo-pixel>
References: <20180518074918.13816-1-kent.overstreet@gmail.com>
 <20180518174536.ai26bg3bhlvzq4pi@destiny>
 <20180518174912.GE31737@kmo-pixel>
 <20180518180324.ymwbajfw5wsfrlth@destiny>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180518180324.ymwbajfw5wsfrlth@destiny>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, May 18, 2018 at 02:03:25PM -0400, Josef Bacik wrote:
> There's nothing stopping us from doing that, it just uses a kprobe to override
> the function with our helper, so we could conceivably put it anywhere in the
> function.  The reason I limited it to individual functions was because it was
> easier than trying to figure out the side-effects of stopping mid-function.  If
> I needed to fail mid-function I just added a helper where I needed it and failed
> that instead.  I imagine safety is going to be of larger concern if we allow bpf
> scripts to randomly return anywhere inside a function, even if the function is
> marked as allowing error injection.  Thanks,

Ahh no, that's not what I want... here's an example:

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/btree_cache.c#n674

Here we've got to do this thing which can race - which is fine, we just need to
check for and handle the race, on line 709 - but actually exercising that with a
test is difficult since it requires a heavily multithreaded workload with btree
nodes getting evicted to see it happen, so - it pretends the race happened if
race_fault() returns true. The race_fault() invocation shows up in debugfs,
where userspace can tell it to fire.

the way it works is dynamic_fault() is a macro that expands to a static struct
dfault_descriptor, stuck in a particular linker section so the dynamic fault
code can find them and stick them in debugfs (which is also the way dynamic
debug works).

#define dynamic_fault(_class)						\
({									\
	static struct _dfault descriptor				\
	__used __aligned(8) __attribute__((section("__faults"))) = {	\
		.modname	= KBUILD_MODNAME,			\
		.function	= __func__,				\
		.filename	= __FILE__,				\
		.line		= __LINE__,				\
		.class		= _class,				\
	};								\
									\
	static_key_false(&descriptor.enabled) &&			\
		__dynamic_fault_enabled(&descriptor);			\
})

Honestly it still seems like the cleanest and safest way of doing it to me...