All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: Matthew Wilcox <willy@infradead.org>, Yury Norov <yury.norov@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps
Date: Wed, 11 Oct 2023 21:25:14 +0200	[thread overview]
Message-ID: <166ade6f-f3fb-4b37-bdf1-db0317d1796f@alu.unizg.hr> (raw)
In-Reply-To: <ZSbuMWGYyulgUA6g@casper.infradead.org>



On 10/11/23 20:49, Matthew Wilcox wrote:
> On Wed, Oct 11, 2023 at 11:26:29AM -0700, Yury Norov wrote:
>> Long story short: KCSAN found some potential issues related to how
>> people use bitmap API. And instead of working through that issues,
>> the following code shuts down KCSAN by applying READ_ONCE() here
>> and there.
> 
> Pfft.
> 
>> READ_ONCE() fixes nothing because nothing is broken in find_bit() API.
>> As I suspected, and as Matthew confirmed in his email, the true reason
>> for READ_ONCE() here is to disable KCSAN check:
>>
>>          READ_ONCE() serves two functions here;
>>          one is that it tells the compiler not to try anything fancy, and
>>          the other is that it tells KCSAN to not bother instrumenting this
>>          load; no load-delay-reload.
>>
>> https://lkml.kernel.org/linux-mm/ZQkhgVb8nWGxpSPk@casper.infradead.org/
>>
>> And as side-effect, it of course hurts the performance. In the same
>> email Matthew said he doesn't believe me that READ_ONCE would do that,
>> so thank you for testing and confirming that it does.
> 
> You really misinterpreted what Jan wrote to accomplish this motivated
> reasoning.
> 
>> Jan, I think that in your last email you confirmed that the xarray
>> problem that you're trying to solve is about a lack of proper locking:
>>
>>          Well, for xarray the write side is synchronized with a spinlock but the read
>>          side is not (only RCU protected).
>>
>> https://lkml.kernel.org/linux-mm/20230918155403.ylhfdbscgw6yek6p@quack3/
>>
>> If there's no enough synchronization, why not just adding it?
> 
> You don't understand.  We _intend_ for there to be no locking.
> We_understand_ there is a race here.  We're _fine_ with there being
> a race here.
> 
>> Regardless performance consideration, my main concern is that this patch
>> considers bitmap as an atomic structure, which is intentionally not.
>> There are just a few single-bit atomic operations like set_bit() and
>> clear_bit(). All other functions are non-atomic, including those
>> find_bit() operations.
> 
> ... and for KCSAN to understand that, we have to use READ_ONCE.
> 
>> There is quite a lot of examples of wrong use of bitmaps wrt
>> atomicity, the most typical is like:
>>          for(idx = 0; idx < num; idx++) {
>>                  ...
>>                  set_bit(idx, bitmap);
>>          }
>>
>> This is wrong because a series of atomic ops is not atomic itself, and
>> if you see something like this in you code, it should be converted to
>> using non-atomic __set_bit(), and protected externally if needed.
> 
> That is a bad use of set_bit()!  I agree!  See, for example, commit
> b21866f514cb where I remove precisely this kind of code.
> 
>> Similarly, READ_ONCE() in a for-loop doesn't guarantee any ordering or
>> atomicity, and only hurts the performance. And this is exactly what
>> this patch does.
> 
> Go back and read Jan's patch again, instead of cherry-picking some
> little bits that confirm your prejudices.

Hey Yuri,

No hard feelings, but I tend to agree with Mr. Wilcox and Jan.

set_bit just as any atomic increment or memory barrier - by the same
author you quoted - causes a LOCK prefix to the assembler instruction.
By my modest knowledge of the machine language, this will cause the CPU
core to LOCK the memory bus for the time the byte, word, longword or quadword
(or even bit) is being read, changed, and written back.

If I am not making a stupid logical mistake, this LOCK on the memory
bus by a core is going to prevent the other cores from accessing memory
or filling or flushing their caches.

I agree with Mr. Wilcox that locking would have much worse performance
penalty that a simply READ_ONCE() that is designed to prevent the compiler
from the "funny" optimisations, such as using the two faster instructions
instead of the atomic load - which might in the worst case be interrupted
just between two half-loads.

It does nothing to hurt performance on the level of a memory read or write barrier
or the memory bus lock that stalls all cores.

So, it silences KCSAN and I am happy with it, but I will not proceed with
a formal patch proposal until we have a consensus about it.

The data-race actually means that another core can tear your half-load and
you get unexpected results. Why does it happen more often on my configuration
that on the others I cannot tell. But it might hurt the integrity of any
filesystem relying of find_first_bit() and find_next_bit() primitives.

I mean, in the worst case scenario.

Meaning, I might not opt to go to Mars with the ship's computer running
with data-races ;-)

Best regards,
Mirsad Todorovac

  reply	other threads:[~2023-10-11 19:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 15:02 [PATCH 0/2] lib/find: Fix KCSAN warnings in find_*_bit() functions Jan Kara
2023-10-11 15:02 ` [PATCH 1/2] lib/find: Make functions safe on changing bitmaps Jan Kara
2023-10-11 18:26   ` Yury Norov
2023-10-11 18:49     ` Matthew Wilcox
2023-10-11 19:25       ` Mirsad Todorovac [this message]
2023-10-12 12:21     ` Jan Kara
2023-10-14  0:15       ` Yury Norov
2023-10-14  2:21         ` Mirsad Goran Todorovac
2023-10-14  2:53           ` Yury Norov
2023-10-14 10:04             ` Mirsad Todorovac
2023-10-16  9:22         ` Jan Kara
2023-10-11 20:40   ` Mirsad Todorovac
2023-10-18 16:23   ` kernel test robot
2023-10-25  7:18   ` kernel test robot
2023-10-25  8:18     ` Rasmus Villemoes
2023-10-27  3:51       ` Yury Norov
2023-10-27  9:55         ` Jan Kara
2023-10-27 15:51         ` Mirsad Todorovac
2023-10-11 15:02 ` [PATCH 2/2] xarray: Fix race in xas_find_chunk() Jan Kara
2023-10-11 15:38   ` Matthew Wilcox
2023-10-11 20:40   ` Mirsad Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=166ade6f-f3fb-4b37-bdf1-db0317d1796f@alu.unizg.hr \
    --to=mirsad.todorovac@alu.unizg.hr \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=willy@infradead.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.