All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Valente <paolo.valente@linaro.org>
To: "Holger Hoffstätte" <holger@applied-asynchrony.com>
Cc: Luca Mariotti <mariottiluca1@hotmail.it>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	linux-kernel@vger.kernel.org,
	Pietro Pedroni <pedroni.pietro.96@gmail.com>
Subject: Re: [PATCH BUGFIX] block, bfq: fix delayed stable merge check
Date: Tue, 25 May 2021 12:40:04 +0200	[thread overview]
Message-ID: <7E3E9C04-9969-4BAE-8474-A9B3BF449F87@linaro.org> (raw)
In-Reply-To: <3005fed2-d5f6-f709-cb3a-3d865623015d@applied-asynchrony.com>



> Il giorno 24 mag 2021, alle ore 20:45, Holger Hoffstätte <holger@applied-asynchrony.com> ha scritto:
> 
> On 2021-05-24 19:41, Paolo Valente wrote:
>>> Il giorno 24 mag 2021, alle ore 19:13, Holger Hoffstätte <holger@applied-asynchrony.com> ha scritto:
>>> 
>>> On 2021-05-24 18:57, Paolo Valente wrote:
>>>>> Il giorno 20 mag 2021, alle ore 09:15, Holger Hoffstätte <holger@applied-asynchrony.com> ha scritto:
>>>>> 
>>>>> On 2021-05-18 12:43, Luca Mariotti wrote:
>>>>>> When attempting to schedule a merge of a given bfq_queue with the currently
>>>>>> in-service bfq_queue or with a cooperating bfq_queue among the scheduled
>>>>>> bfq_queues, delayed stable merge is checked for rotational or non-queueing
>>>>>> devs. For this stable merge to be performed, some conditions must be met.
>>>>>> If the current bfq_queue underwent some split from some merged bfq_queue,
>>>>>> one of these conditions is that two hundred milliseconds must elapse from
>>>>>> split, otherwise this condition is always met.
>>>>>> Unfortunately, by mistake, time_is_after_jiffies() was written instead of
>>>>>> time_is_before_jiffies() for this check, verifying that less than two
>>>>>> hundred milliseconds have elapsed instead of verifying that at least two
>>>>>> hundred milliseconds have elapsed.
>>>>>> Fix this issue by replacing time_is_after_jiffies() with
>>>>>> time_is_before_jiffies().
>>>>>> Signed-off-by: Luca Mariotti <mariottiluca1@hotmail.it>
>>>>>> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
>>>>>> Signed-off-by: Pietro Pedroni <pedroni.pietro.96@gmail.com>
>>>>>> ---
>>>>>>  block/bfq-iosched.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>>>> index acd1f881273e..2adb1e69c9d2 100644
>>>>>> --- a/block/bfq-iosched.c
>>>>>> +++ b/block/bfq-iosched.c
>>>>>> @@ -2697,7 +2697,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>>>>>  	if (unlikely(!bfqd->nonrot_with_queueing)) {
>>>>>>  		if (bic->stable_merge_bfqq &&
>>>>>>  		    !bfq_bfqq_just_created(bfqq) &&
>>>>>> -		    time_is_after_jiffies(bfqq->split_time +
>>>>>> +		    time_is_before_jiffies(bfqq->split_time +
>>>>>>  					  msecs_to_jiffies(200))) {
>>>>>>  			struct bfq_queue *stable_merge_bfqq =
>>>>>>  				bic->stable_merge_bfqq;
>>>>> 
>>>>> Not sure why but with this patch I quickly got a division-by-zero in BFQ and
>>>>> complete system halt. Unfortunately I couldn't capture the exact stack trace,
>>>>> but it read something like bfq_calc_weight() or something ike that.
>>>>> I looked through the code and found bfq_delta(), so maybe weight got
>>>>> reduced to 0?
>>>>> 
>>>> Hi Holger,
>>>> is this (easily) reproducible for you?  If so, I'd like to propose you
>>>> a candidate fix.
>>> 
>>> Yes, it's easily reproducible (should be reproducible on 5.13-rc as well).
>>> Simple read/write I/O on a cold FS (rotational disk obviously) will crash
>>> pretty much immediately; without it everything works fine, likely because the
>>> bug (in the recent queue merging patches?) is never triggered due to the
>>> accidentally-wrong time calculation.
>> Exactly!
>> Unfortunately, no crash happens on my systems.  Or, actually, crashes
>> stopped after the attached fix.
>>> Will gladly test your patch! :)
>>> 
>> Here it is!
>> I'll make a proper commit after your early tests.
>> Crossing my fingers,
>> Paolo
> 
> That did it - it now survived a bunch of heavy read/write/mixed I/O that
> would previously crash right away. Maybe it's because btrfs uses several
> workers and so different IOs got mixed together? Anyway:
> 
> Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
> 

Great!

Thank you very much!

I will put this fix in an upcoming patch series.

Paolo

> Thanks!
> Holger


      reply	other threads:[~2021-05-25 10:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18 10:43 [PATCH BUGFIX] block, bfq: fix delayed stable merge check Luca Mariotti
2021-05-20  7:15 ` Holger Hoffstätte
2021-05-20 16:39   ` Holger Hoffstätte
2021-05-24 16:57   ` Paolo Valente
2021-05-24 17:13     ` Holger Hoffstätte
2021-05-24 17:41       ` Paolo Valente
2021-05-24 18:45         ` Holger Hoffstätte
2021-05-25 10:40           ` Paolo Valente [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7E3E9C04-9969-4BAE-8474-A9B3BF449F87@linaro.org \
    --to=paolo.valente@linaro.org \
    --cc=axboe@kernel.dk \
    --cc=holger@applied-asynchrony.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mariottiluca1@hotmail.it \
    --cc=pedroni.pietro.96@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.