All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
@ 2020-08-29 16:51 Jens Axboe
  2020-08-30  6:26 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-08-29 16:51 UTC (permalink / raw)
  To: linux-block

We currently increment the task/vm counts when we first attempt to queue a
bio. But this isn't necessarily correct - if the request allocation fails
with -EAGAIN, for example, and the caller retries, then we'll over-account
by as many retries as are done.

This can happen for polled IO, where we cannot wait for requests. Hence
retries can get aggressive, if we're running out of requests. If this
happens, then watching the IO rates in vmstat are incorrect as they count
every issue attempt as successful and hence the stats are inflated by
quite a lot potentially.

Add a bio flag to know if we've done accounting or not. This prevents
the same bio from being accounted potentially many times, when retried.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..ff562a8cd9c9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1236,7 +1236,7 @@ blk_qc_t submit_bio(struct bio *bio)
 	 * If it's a regular read/write or a barrier with data attached,
 	 * go through the normal accounting stuff before submission.
 	 */
-	if (bio_has_data(bio)) {
+	if (bio_has_data(bio) && !bio_flagged(bio, BIO_ACCOUNTED)) {
 		unsigned int count;
 
 		if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME))
@@ -1259,6 +1259,7 @@ blk_qc_t submit_bio(struct bio *bio)
 				(unsigned long long)bio->bi_iter.bi_sector,
 				bio_devname(bio, b), count);
 		}
+		bio_set_flag(bio, BIO_ACCOUNTED);
 	}
 
 	/*
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 63a39e47fc60..39bcc9326c7a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -266,6 +266,7 @@ enum {
 				 * of this bio. */
 	BIO_CGROUP_ACCT,	/* has been accounted to a cgroup */
 	BIO_TRACKED,		/* set if bio goes through the rq_qos path */
+	BIO_ACCOUNTED,		/* task/vm stats have been done */
 	BIO_FLAG_LAST
 };
 
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-29 16:51 [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting Jens Axboe
@ 2020-08-30  6:26 ` Christoph Hellwig
  2020-08-30 15:09   ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2020-08-30  6:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block

On Sat, Aug 29, 2020 at 10:51:11AM -0600, Jens Axboe wrote:
> We currently increment the task/vm counts when we first attempt to queue a
> bio. But this isn't necessarily correct - if the request allocation fails
> with -EAGAIN, for example, and the caller retries, then we'll over-account
> by as many retries as are done.
> 
> This can happen for polled IO, where we cannot wait for requests. Hence
> retries can get aggressive, if we're running out of requests. If this
> happens, then watching the IO rates in vmstat are incorrect as they count
> every issue attempt as successful and hence the stats are inflated by
> quite a lot potentially.
> 
> Add a bio flag to know if we've done accounting or not. This prevents
> the same bio from being accounted potentially many times, when retried.

Can't the resubmitter just use submit_bio_noacct?  What is the call
stack here?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-30  6:26 ` Christoph Hellwig
@ 2020-08-30 15:09   ` Jens Axboe
  2020-08-30 15:28     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-08-30 15:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 8/30/20 12:26 AM, Christoph Hellwig wrote:
> On Sat, Aug 29, 2020 at 10:51:11AM -0600, Jens Axboe wrote:
>> We currently increment the task/vm counts when we first attempt to queue a
>> bio. But this isn't necessarily correct - if the request allocation fails
>> with -EAGAIN, for example, and the caller retries, then we'll over-account
>> by as many retries as are done.
>>
>> This can happen for polled IO, where we cannot wait for requests. Hence
>> retries can get aggressive, if we're running out of requests. If this
>> happens, then watching the IO rates in vmstat are incorrect as they count
>> every issue attempt as successful and hence the stats are inflated by
>> quite a lot potentially.
>>
>> Add a bio flag to know if we've done accounting or not. This prevents
>> the same bio from being accounted potentially many times, when retried.
> 
> Can't the resubmitter just use submit_bio_noacct?  What is the call
> stack here?

The resubmitter is way higher than that. You could potentially have that
done in the block layer, but not higher up.

The use case is async submissions, going through ->read_iter() again.
Or ->write_iter().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-30 15:09   ` Jens Axboe
@ 2020-08-30 15:28     ` Christoph Hellwig
  2020-08-31  3:12       ` Ming Lei
  2020-08-31 14:02       ` Jens Axboe
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2020-08-30 15:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block

On Sun, Aug 30, 2020 at 09:09:02AM -0600, Jens Axboe wrote:
> On 8/30/20 12:26 AM, Christoph Hellwig wrote:
> > On Sat, Aug 29, 2020 at 10:51:11AM -0600, Jens Axboe wrote:
> >> We currently increment the task/vm counts when we first attempt to queue a
> >> bio. But this isn't necessarily correct - if the request allocation fails
> >> with -EAGAIN, for example, and the caller retries, then we'll over-account
> >> by as many retries as are done.
> >>
> >> This can happen for polled IO, where we cannot wait for requests. Hence
> >> retries can get aggressive, if we're running out of requests. If this
> >> happens, then watching the IO rates in vmstat are incorrect as they count
> >> every issue attempt as successful and hence the stats are inflated by
> >> quite a lot potentially.
> >>
> >> Add a bio flag to know if we've done accounting or not. This prevents
> >> the same bio from being accounted potentially many times, when retried.
> > 
> > Can't the resubmitter just use submit_bio_noacct?  What is the call
> > stack here?
> 
> The resubmitter is way higher than that. You could potentially have that
> done in the block layer, but not higher up.
> 
> The use case is async submissions, going through ->read_iter() again.
> Or ->write_iter().

But how does a bio flag help there?  If we go through the file ops
again the next submission will be a new bio structure.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-30 15:28     ` Christoph Hellwig
@ 2020-08-31  3:12       ` Ming Lei
  2020-08-31 14:02       ` Jens Axboe
  1 sibling, 0 replies; 10+ messages in thread
From: Ming Lei @ 2020-08-31  3:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Sun, Aug 30, 2020 at 04:28:00PM +0100, Christoph Hellwig wrote:
> On Sun, Aug 30, 2020 at 09:09:02AM -0600, Jens Axboe wrote:
> > On 8/30/20 12:26 AM, Christoph Hellwig wrote:
> > > On Sat, Aug 29, 2020 at 10:51:11AM -0600, Jens Axboe wrote:
> > >> We currently increment the task/vm counts when we first attempt to queue a
> > >> bio. But this isn't necessarily correct - if the request allocation fails
> > >> with -EAGAIN, for example, and the caller retries, then we'll over-account
> > >> by as many retries as are done.
> > >>
> > >> This can happen for polled IO, where we cannot wait for requests. Hence
> > >> retries can get aggressive, if we're running out of requests. If this
> > >> happens, then watching the IO rates in vmstat are incorrect as they count
> > >> every issue attempt as successful and hence the stats are inflated by
> > >> quite a lot potentially.
> > >>
> > >> Add a bio flag to know if we've done accounting or not. This prevents
> > >> the same bio from being accounted potentially many times, when retried.
> > > 
> > > Can't the resubmitter just use submit_bio_noacct?  What is the call
> > > stack here?
> > 
> > The resubmitter is way higher than that. You could potentially have that
> > done in the block layer, but not higher up.
> > 
> > The use case is async submissions, going through ->read_iter() again.
> > Or ->write_iter().
> 
> But how does a bio flag help there?  If we go through the file ops
> again the next submission will be a new bio structure.

Yeah, we also have use cases of stack bio variable.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-30 15:28     ` Christoph Hellwig
  2020-08-31  3:12       ` Ming Lei
@ 2020-08-31 14:02       ` Jens Axboe
  2020-08-31 14:12         ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-08-31 14:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 8/30/20 9:28 AM, Christoph Hellwig wrote:
> On Sun, Aug 30, 2020 at 09:09:02AM -0600, Jens Axboe wrote:
>> On 8/30/20 12:26 AM, Christoph Hellwig wrote:
>>> On Sat, Aug 29, 2020 at 10:51:11AM -0600, Jens Axboe wrote:
>>>> We currently increment the task/vm counts when we first attempt to queue a
>>>> bio. But this isn't necessarily correct - if the request allocation fails
>>>> with -EAGAIN, for example, and the caller retries, then we'll over-account
>>>> by as many retries as are done.
>>>>
>>>> This can happen for polled IO, where we cannot wait for requests. Hence
>>>> retries can get aggressive, if we're running out of requests. If this
>>>> happens, then watching the IO rates in vmstat are incorrect as they count
>>>> every issue attempt as successful and hence the stats are inflated by
>>>> quite a lot potentially.
>>>>
>>>> Add a bio flag to know if we've done accounting or not. This prevents
>>>> the same bio from being accounted potentially many times, when retried.
>>>
>>> Can't the resubmitter just use submit_bio_noacct?  What is the call
>>> stack here?
>>
>> The resubmitter is way higher than that. You could potentially have that
>> done in the block layer, but not higher up.
>>
>> The use case is async submissions, going through ->read_iter() again.
>> Or ->write_iter().
> 
> But how does a bio flag help there?  If we go through the file ops
> again the next submission will be a new bio structure.

Yeah the patch is garbage, can't work. The previous suggestion is here:

https://lore.kernel.org/linux-block/395b4c19-cc80-eebb-f6ab-04687110c84a@kernel.dk/T/

which isn't super pretty either, but at least it works. Not sure there's
a better solution, outside of marking the iocb as retry and then
carrying that flag forward for the bio as well. And that seems a bit
much for this case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-31 14:02       ` Jens Axboe
@ 2020-08-31 14:12         ` Christoph Hellwig
  2020-08-31 14:18           ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2020-08-31 14:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block

On Mon, Aug 31, 2020 at 08:02:43AM -0600, Jens Axboe wrote:
> >> The use case is async submissions, going through ->read_iter() again.
> >> Or ->write_iter().
> > 
> > But how does a bio flag help there?  If we go through the file ops
> > again the next submission will be a new bio structure.
> 
> Yeah the patch is garbage, can't work. The previous suggestion is here:
> 
> https://lore.kernel.org/linux-block/395b4c19-cc80-eebb-f6ab-04687110c84a@kernel.dk/T/
> 
> which isn't super pretty either, but at least it works. Not sure there's
> a better solution, outside of marking the iocb as retry and then
> carrying that flag forward for the bio as well. And that seems a bit
> much for this case.

We'll still need a flag with the above to skip the submit_bio_noacct
bios.  But I think it is the right way to go.  Eventually we'll also
need to push the accounting down into the individual bio based drivers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-31 14:12         ` Christoph Hellwig
@ 2020-08-31 14:18           ` Jens Axboe
  2020-09-01  5:42             ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2020-08-31 14:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 8/31/20 8:12 AM, Christoph Hellwig wrote:
> On Mon, Aug 31, 2020 at 08:02:43AM -0600, Jens Axboe wrote:
>>>> The use case is async submissions, going through ->read_iter() again.
>>>> Or ->write_iter().
>>>
>>> But how does a bio flag help there?  If we go through the file ops
>>> again the next submission will be a new bio structure.
>>
>> Yeah the patch is garbage, can't work. The previous suggestion is here:
>>
>> https://lore.kernel.org/linux-block/395b4c19-cc80-eebb-f6ab-04687110c84a@kernel.dk/T/
>>
>> which isn't super pretty either, but at least it works. Not sure there's
>> a better solution, outside of marking the iocb as retry and then
>> carrying that flag forward for the bio as well. And that seems a bit
>> much for this case.
> 
> We'll still need a flag with the above to skip the submit_bio_noacct
> bios.  But I think it is the right way to go.  Eventually we'll also
> need to push the accounting down into the individual bio based drivers.

For the iocb propagation, we'd really need the caller to mark the iocb
as IOCB_ACCOUNTED (or whatever) if BIO_ACCOUNTED is set, since we can't
do that further down the stack as we really don't know if we hit -EAGAIN
before or after the bio was accounted... Which kind of sucks, as it'll
be hard to contain in a generic fashion.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-08-31 14:18           ` Jens Axboe
@ 2020-09-01  5:42             ` Christoph Hellwig
  2020-09-01 14:01               ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2020-09-01  5:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block

On Mon, Aug 31, 2020 at 08:18:48AM -0600, Jens Axboe wrote:
> > We'll still need a flag with the above to skip the submit_bio_noacct
> > bios.  But I think it is the right way to go.  Eventually we'll also
> > need to push the accounting down into the individual bio based drivers.
> 
> For the iocb propagation, we'd really need the caller to mark the iocb
> as IOCB_ACCOUNTED (or whatever) if BIO_ACCOUNTED is set, since we can't
> do that further down the stack as we really don't know if we hit -EAGAIN
> before or after the bio was accounted... Which kind of sucks, as it'll
> be hard to contain in a generic fashion.

Well, that's why I think the only proper fix is to only account a bio
when we know the driver is actually going to submit it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting
  2020-09-01  5:42             ` Christoph Hellwig
@ 2020-09-01 14:01               ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2020-09-01 14:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 8/31/20 11:42 PM, Christoph Hellwig wrote:
> On Mon, Aug 31, 2020 at 08:18:48AM -0600, Jens Axboe wrote:
>>> We'll still need a flag with the above to skip the submit_bio_noacct
>>> bios.  But I think it is the right way to go.  Eventually we'll also
>>> need to push the accounting down into the individual bio based drivers.
>>
>> For the iocb propagation, we'd really need the caller to mark the iocb
>> as IOCB_ACCOUNTED (or whatever) if BIO_ACCOUNTED is set, since we can't
>> do that further down the stack as we really don't know if we hit -EAGAIN
>> before or after the bio was accounted... Which kind of sucks, as it'll
>> be hard to contain in a generic fashion.
> 
> Well, that's why I think the only proper fix is to only account a bio
> when we know the driver is actually going to submit it.

Yeah I agree, it's a lot less code too. Which is basically back to my
original RFC, I'll see if I can clean it up a bit.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-01 14:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-29 16:51 [PATCH] block: fix -EAGAIN IOPOLL task/vm accounting Jens Axboe
2020-08-30  6:26 ` Christoph Hellwig
2020-08-30 15:09   ` Jens Axboe
2020-08-30 15:28     ` Christoph Hellwig
2020-08-31  3:12       ` Ming Lei
2020-08-31 14:02       ` Jens Axboe
2020-08-31 14:12         ` Christoph Hellwig
2020-08-31 14:18           ` Jens Axboe
2020-09-01  5:42             ` Christoph Hellwig
2020-09-01 14:01               ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.