Re: [dm-devel] [dm-6.4 PATCH v2 3/9] dm bufio: improve concurrent IO performance

From: Jens Axboe <axboe@kernel.dk>
To: Mike Snitzer <snitzer@kernel.org>
Cc: linux-block@vger.kernel.org, keescook@chromium.org,
	heinzm@redhat.com, ejt@redhat.com, nhuck@google.com,
	ebiggers@kernel.org, dm-devel@redhat.com, mpatocka@redhat.com,
	luomeng12@huawei.com
Subject: Re: [dm-devel] [dm-6.4 PATCH v2 3/9] dm bufio: improve concurrent IO performance
Date: Fri, 24 Mar 2023 17:11:12 -0600	[thread overview]
Message-ID: <e552ad80-37fe-247e-aaf1-064572ccc154@kernel.dk> (raw)
In-Reply-To: <ZB4p2NfwIhh9raxa@redhat.com>

On 3/24/23 4:53?PM, Mike Snitzer wrote:
> On Fri, Mar 24 2023 at  3:34P -0400,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
>> Just some random drive-by comments.
>>
>>> diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
>>> index 1de1bdcda1ce..a58f8ac3ba75 100644
>>> --- a/drivers/md/dm-bufio.c
>>> +++ b/drivers/md/dm-bufio.c
>>> +static void lru_destroy(struct lru *lru)
>>> +{
>>> +	BUG_ON(lru->cursor);
>>> +	BUG_ON(!list_empty(&lru->iterators));
>>> +}
>>
>> Ehm no, WARN_ON_ONCE() for these presumably.
> 
> Yeah, I raised concern about the BUG_ONs with Joe. Will try to rid the
> code of BUG_ONs as follow-on work.

Please do.

>>> @@ -116,9 +366,579 @@ struct dm_buffer {
>>>  #endif
>>>  };
>>>  
>>> +/*--------------------------------------------------------------*/
>>> +
>>> +/*
>>> + * The buffer cache manages buffers, particularly:
>>> + *  - inc/dec of holder count
>>> + *  - setting the last_accessed field
>>> + *  - maintains clean/dirty state along with lru
>>> + *  - selecting buffers that match predicates
>>> + *
>>> + * It does *not* handle:
>>> + *  - allocation/freeing of buffers.
>>> + *  - IO
>>> + *  - Eviction or cache sizing.
>>> + *
>>> + * cache_get() and cache_put() are threadsafe, you do not need to
>>> + * protect these calls with a surrounding mutex.  All the other
>>> + * methods are not threadsafe; they do use locking primitives, but
>>> + * only enough to ensure get/put are threadsafe.
>>> + */
>>> +
>>> +#define NR_LOCKS 64
>>> +#define LOCKS_MASK (NR_LOCKS - 1)
>>> +
>>> +struct tree_lock {
>>> +	struct rw_semaphore lock;
>>> +} ____cacheline_aligned_in_smp;
>>> +
>>> +struct dm_buffer_cache {
>>> +	/*
>>> +	 * We spread entries across multiple trees to reduce contention
>>> +	 * on the locks.
>>> +	 */
>>> +	struct tree_lock locks[NR_LOCKS];
>>> +	struct rb_root roots[NR_LOCKS];
>>> +	struct lru lru[LIST_SIZE];
>>> +};
>>
>> This:
>>
>> struct foo_tree {
>> 	struct rw_semaphore lock;
>> 	struct rb_root root;
>> 	struct lru lru;
>> } ____cacheline_aligned_in_smp;
>>
>> would be a lot better.
>>
>> And where does this NR_LOCKS come from? Don't make up magic values out
>> of thin air. Should this be per-cpu? per-node? N per node? I'll bet you
>> that 64 is way too much for most use cases, and too little for others.
> 
> I cannot speak to the 64 magic value (other than I think it worked
> well for Joe's testbed).  But the point of this exercise is to split
> the lock to avoid contention.  Using 64 accomplishes this. Having
> there be more or less isn't _that_ critical.  The hash to get the
> region index isn't a function of cpu.  We aren't after lockless here.

I don't doubt it worked well in his setup, and it'll probably be fine in
a lot of setups. But the point still stands - it's a magic value, it
should at least be documented. And 64 is likely way too much on a lot of
machines.

> Now that said, will certainly discuss further with Joe and see if we
> can be smarter here.
> 
> Your suggestion to combine members certainly makes a lot of sense.  I
> ran with it relative to the bio-prison-v1 (patch 9) changes which have
> the same layout. Definitely a win on in-core memory as well as
> avoiding cacheline thrashing while accessing the lock and then the
> rb_root members (as is always expected):

Right, this is why I suggested doing it like that. It's not very smart
to split related members like that, wastes both memory and is less
efficient than doing the right thing.

>> I stopped reading here, the patch is just too long. Surely this could be
>> split up?
>>
>>  1 file changed, 1292 insertions(+), 477 deletions(-)
>>
>> That's not a patch, that's a patch series.
> 
> I don't want to upset you or the community but saying this but:
> in this narrow instance where a sizable percentage of the file got
> rewritten: to properly review this work you need to look at the full
> scope of the changes in combination.

That's nonsense. That's like saying "to see what this series does, apply
the whole thing and compare it with the file before". With that logic,
why even split changes ever.

A big patch that could be split is harder to properly review than a lot
of small patches. Nobody ever reviews a 1000+ line patch. But I guess we
can just stop reviewing?

It should be split, it's really not up for debate.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel