All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kees Cook <keescook@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Kostya Serebryany <kcc@google.com>
Subject: Re: [PATCH] fs: ratelimit __find_get_block_slow() failure message.
Date: Thu, 17 Jan 2019 14:18:56 +0100	[thread overview]
Message-ID: <CACT4Y+a0=45WTbtyuMOq1y5aHw1W-o7c5t3YFnOPeAfgswTA1g@mail.gmail.com> (raw)
In-Reply-To: <20190116162813.GA5446@kroah.com>

On Wed, Jan 16, 2019 at 5:28 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jan 16, 2019 at 12:48:41PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 16, 2019 at 12:03 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > On Wed, Jan 16, 2019 at 11:43 AM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Wed 16-01-19 10:47:56, Dmitry Vyukov wrote:
> > > > > On Fri, Jan 11, 2019 at 1:46 PM Tetsuo Handa
> > > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > > > >
> > > > > > On 2019/01/11 19:48, Dmitry Vyukov wrote:
> > > > > > >> How did you arrive to the conclusion that it is harmless?
> > > > > > >> There is only one relevant standard covering this, which is the C
> > > > > > >> language standard, and it is very clear on this -- this has Undefined
> > > > > > >> Behavior, that is the same as, for example, reading/writing random
> > > > > > >> pointers.
> > > > > > >>
> > > > > > >> Check out this on how any race that you might think is benign can be
> > > > > > >> badly miscompiled and lead to arbitrary program behavior:
> > > > > > >> https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
> > > > > > >
> > > > > > > Also there is no other practical definition of data race for automatic
> > > > > > > data race detectors than: two conflicting non-atomic concurrent
> > > > > > > accesses. Which this code is. Which means that if we continue writing
> > > > > > > such code we are not getting data race detection and don't detect
> > > > > > > thousands of races in kernel code that one may consider more harmful
> > > > > > > than this one the easy way. And instead will spent large amounts of
> > > > > > > time to fix some of then the hard way, and leave the rest as just too
> > > > > > > hard to debug so let the kernel continue crashing from time to time (I
> > > > > > > believe a portion of currently open syzbot bugs that developers just
> > > > > > > left as "I don't see how this can happen" are due to such races).
> > > > > > >
> > > > > >
> > > > > > I still cannot catch. Read/write of sizeof(long) bytes at naturally
> > > > > > aligned address is atomic, isn't it?
> > > > >
> > > > > Nobody guarantees this. According to C non-atomic conflicting
> > > > > reads/writes of sizeof(long) cause undefined behavior of the whole
> > > > > program.
> > > >
> > > > Yes, but to be fair the kernel has always relied on long accesses to be
> > > > atomic pretty heavily so that it is now de-facto standard for the kernel
> > > > AFAICT. I understand this makes life for static checkers hard but such is
> > > > reality.
> > >
> > > Yes, but nobody never defined what "a long access" means. And if you
> > > see a function that accepts a long argument and stores it into a long
> > > field, no, it does not qualify. I bet this will come at surprise to
> > > lots of developers.
> > > Check out this fix and try to extrapolate how this "function stores
> > > long into a long leads to a serious security bug" can actually be
> > > applied to whole lot of places after inlining (or when somebody just
> > > slightly shuffles code in a way that looks totally safe) that also
> > > kinda look safe and atomic:
> > > https://lore.kernel.org/patchwork/patch/599779/
> > > So where is the boundary between "a long access" that is atomic and
> > > the one that is not necessary atomic?
> >
> >
> > +Linus, Greg, Kees
> >
> > I wanted to provide a hash/link to this commit but, wait, you want to
> > say that this patch for a security bugs was mailed, recorded by
> > patchwork, acked by subsystem developer and then dropped on the floor
> > for 3+ years? Doh!
> >
> > https://lore.kernel.org/patchwork/patch/599779/
> >
> > There are known ways how to make this not a thing at all. Like open
> > pull requests on github:
> > https://github.com/google/syzkaller/pulls
> > or, some projects even do own dashboard for this:
> > https://dev.golang.org/reviews
> > because this is important. Especially for new contributors, drive-by
> > improvements, good samaritan fixes, etc.
> >
> > Another example: a bug-fixing patch was lost for 2 years:
> > "Two years ago ;) I don't understand why there were ignored"
> > https://www.spinics.net/lists/linux-mm/msg161351.html
> >
> > Another example: a patch is applied to a subsystem tree and then lost
> > for 6 months:
> > https://patchwork.kernel.org/patch/10339089/
>
> I don't understand the issue here.  Are you saying that sometimes
> patches that have been submitted get dropped?  Yes, that's known, it is
> up to the submitter to verify and ensure that the patch is applied.
> Given our rate of change and the large workload that some maintainers
> have, this is the best that we can do at the moment.
>
> Putting it all in a github dashboard would not scale in the least (other
> projects smaller than us have tried and ended up stopping from doing
> that as it fails horribly).
>
> Yes, we can always do better, but remember that the submitter needs to
> take the time to ensure that their patches are applied.  Heck, I have
> patches submitted months ago that I know the maintainers ignored, and I
> need to remember to send them again.  We put the burden of development
> on the thing that scales, the developer themselves, not the maintainer
> here.
>
> It's the best that we know of how to do at the moment, and we are always
> trying to do better.  Examples of this are where some subsystems are now
> getting multiple maintainers to handle the workload, and that's helping
> a lot.  That doesn't work for all subsystems as not all subsystems can
> even find more than one maintainer who is willing to look at the
> patches.

The issue here is that patches are lost and "up to the submitter" is
not fully working.
It may be working reasonably well when a developer has an official
assignment at work to do thing X, and then they can't miss/forget
about "is thing X merged yet". But it fails for new contributors,
drive-by improvements, good samaritan fixes, etc. Things that we need
no less than the first category (maybe more).
Machines are always better than humans at such scrupulous tracking
work. So if humans can do it, machines will do even better.
The dashboard definitely needs to be sharded in multiple dimensions.
E.g. "per subsystem", "per assigned reviewer", and even "per author".
Because e.g. how may mine are lost? Only this one or more? How many
yours are lost? Do you know?
I am sure this is doable and beneficial. I don't know why other
projects failed with this, maybe that's something with github. But
there are also codebases that are 100x larger than kernel and do
amount of changes kernel receives in a year in less than a week and
nothing gets lots thanks to scalable processes and automation.

> Please, resubmit your mount patch again, that's a crazy bug :)

That's the problem. It now requires way more additional work and it's
even unclear if the problem is still there or not, the code has
radically changed. It could have been merged back then as is with 0
additional work. I could have been updated it a week after original
submission. But now that's completely paged out, I am looking at that
file and I don't recognize anything, no idea how the patch should be
updated, I now have no idea what tree such patch should be based on,
etc.
Also a good question is how many other my patches were lost that I now
have no idea about? I discovered this one by pure accident.

To make things more constructive: say, if somebody offers to build
such a system for kernel, in accordance with kernel specific
requirements (would also enable presubmit testing, recording base
commit, not losing review comments and other nice things), would you,
Linus (not sure who is in change of such decisions) be willing to
integrate it into the official kernel development process so that
everybody use it for all kernel changes?

  reply	other threads:[~2019-01-17 13:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-11 10:10 [PATCH] fs: ratelimit __find_get_block_slow() failure message Tetsuo Handa
2019-01-11 10:19 ` Dmitry Vyukov
     [not found]   ` <04c6d87c-fc26-b994-3b34-947414984abe@i-love.sakura.ne.jp>
2019-01-11 10:40     ` Dmitry Vyukov
2019-01-11 10:48       ` Dmitry Vyukov
2019-01-11 12:46         ` Tetsuo Handa
2019-01-16  9:47           ` Dmitry Vyukov
2019-01-16 10:43             ` Jan Kara
2019-01-16 11:03               ` Dmitry Vyukov
2019-01-16 11:48                 ` Dmitry Vyukov
2019-01-16 16:28                   ` Greg Kroah-Hartman
2019-01-17 13:18                     ` Dmitry Vyukov [this message]
2019-01-21  8:37                       ` Jan Kara
2019-01-21 10:33                         ` Tetsuo Handa
2019-01-21 10:48                           ` Greg Kroah-Hartman
2019-01-22 15:27                         ` Kernel development process (was: [PATCH] fs: ratelimit __find_get_block_slow() failure message.) Dmitry Vyukov
2019-01-22 17:15                           ` Jan Kara
2019-01-16 11:56                 ` [PATCH] fs: ratelimit __find_get_block_slow() failure message Jan Kara
2019-01-16 12:37                   ` Dmitry Vyukov
2019-01-16 14:51                     ` Jan Kara
2019-01-16 15:33                       ` Dmitry Vyukov
2019-01-16 16:15                         ` Paul E. McKenney
2019-01-17 14:11               ` Tetsuo Handa
2019-01-18 15:30                 ` Dmitry Vyukov
2019-01-11 10:51       ` Tetsuo Handa
2019-01-11 11:03 ` Jan Kara
2019-01-11 11:37   ` [PATCH v2] " Tetsuo Handa
2019-01-21  8:57     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+a0=45WTbtyuMOq1y5aHw1W-o7c5t3YFnOPeAfgswTA1g@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=kcc@google.com \
    --cc=keescook@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.