linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] loop: Limit the number of requests in the bio list
@ 2012-11-13 16:27 Lukas Czerner
  2012-11-13 16:35 ` Jeff Moyer
  2012-11-13 16:42 ` Jens Axboe
  0 siblings, 2 replies; 7+ messages in thread
From: Lukas Czerner @ 2012-11-13 16:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: axboe, linux-fsdevel, jmoyer, akpm, Lukas Czerner

Currently there is not limitation of number of requests in the loop bio
list. This can lead into some nasty situations when the caller spawns
tons of bio requests taking huge amount of memory. This is even more
obvious with discard where blkdev_issue_discard() will submit all bios
for the range and wait for them to finish afterwards. On really big loop
devices and slow backing file system this can lead to OOM situation as
reported by Dave Chinner.

With this patch we will wait in loop_make_request() if the number of
bios in the loop bio list would exceed 'nr_congestion_on'.
We'll wake up the process as we process the bios form the list. Some
threshold hysteresis is in place to avoid high frequency oscillation.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Dave Chinner <dchinner@redhat.com>
---
v2: add threshold hysteresis
v3: Wait uninterruptible, use nr_congestion_off/on

 drivers/block/loop.c |   12 ++++++++++++
 include/linux/loop.h |    3 +++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 54046e5..311299d 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -463,6 +463,7 @@ out:
  */
 static void loop_add_bio(struct loop_device *lo, struct bio *bio)
 {
+	lo->lo_bio_count++;
 	bio_list_add(&lo->lo_bio_list, bio);
 }
 
@@ -471,6 +472,7 @@ static void loop_add_bio(struct loop_device *lo, struct bio *bio)
  */
 static struct bio *loop_get_bio(struct loop_device *lo)
 {
+	lo->lo_bio_count--;
 	return bio_list_pop(&lo->lo_bio_list);
 }
 
@@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
 		goto out;
 	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
 		goto out;
+	if (lo->lo_bio_count >= q->nr_congestion_on) {
+		spin_unlock_irq(&lo->lo_lock);
+		wait_event(lo->lo_req_wait, lo->lo_bio_count <
+			   q->nr_congestion_off);
+		spin_lock_irq(&lo->lo_lock);
+	}
 	loop_add_bio(lo, old_bio);
 	wake_up(&lo->lo_event);
 	spin_unlock_irq(&lo->lo_lock);
@@ -546,6 +554,8 @@ static int loop_thread(void *data)
 			continue;
 		spin_lock_irq(&lo->lo_lock);
 		bio = loop_get_bio(lo);
+		if (lo->lo_bio_count < lo->lo_queue->nr_congestion_off)
+			wake_up(&lo->lo_req_wait);
 		spin_unlock_irq(&lo->lo_lock);
 
 		BUG_ON(!bio);
@@ -873,6 +883,7 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
 	lo->transfer = transfer_none;
 	lo->ioctl = NULL;
 	lo->lo_sizelimit = 0;
+	lo->lo_bio_count = 0;
 	lo->old_gfp_mask = mapping_gfp_mask(mapping);
 	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
 
@@ -1673,6 +1684,7 @@ static int loop_add(struct loop_device **l, int i)
 	lo->lo_number		= i;
 	lo->lo_thread		= NULL;
 	init_waitqueue_head(&lo->lo_event);
+	init_waitqueue_head(&lo->lo_req_wait);
 	spin_lock_init(&lo->lo_lock);
 	disk->major		= LOOP_MAJOR;
 	disk->first_minor	= i << part_shift;
diff --git a/include/linux/loop.h b/include/linux/loop.h
index 6492181..460b60f 100644
--- a/include/linux/loop.h
+++ b/include/linux/loop.h
@@ -53,10 +53,13 @@ struct loop_device {
 
 	spinlock_t		lo_lock;
 	struct bio_list		lo_bio_list;
+	unsigned int		lo_bio_count;
 	int			lo_state;
 	struct mutex		lo_ctl_mutex;
 	struct task_struct	*lo_thread;
 	wait_queue_head_t	lo_event;
+	/* wait queue for incoming requests */
+	wait_queue_head_t	lo_req_wait;
 
 	struct request_queue	*lo_queue;
 	struct gendisk		*lo_disk;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-13 16:27 [PATCH v3] loop: Limit the number of requests in the bio list Lukas Czerner
@ 2012-11-13 16:35 ` Jeff Moyer
  2012-11-13 16:42 ` Jens Axboe
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2012-11-13 16:35 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-kernel, axboe, linux-fsdevel, akpm

Lukas Czerner <lczerner@redhat.com> writes:

> Currently there is not limitation of number of requests in the loop bio
> list. This can lead into some nasty situations when the caller spawns
> tons of bio requests taking huge amount of memory. This is even more
> obvious with discard where blkdev_issue_discard() will submit all bios
> for the range and wait for them to finish afterwards. On really big loop
> devices and slow backing file system this can lead to OOM situation as
> reported by Dave Chinner.
>
> With this patch we will wait in loop_make_request() if the number of
> bios in the loop bio list would exceed 'nr_congestion_on'.
> We'll wake up the process as we process the bios form the list. Some
> threshold hysteresis is in place to avoid high frequency oscillation.
>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reported-by: Dave Chinner <dchinner@redhat.com>

Acked-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-13 16:27 [PATCH v3] loop: Limit the number of requests in the bio list Lukas Czerner
  2012-11-13 16:35 ` Jeff Moyer
@ 2012-11-13 16:42 ` Jens Axboe
  2012-11-14  9:02   ` Lukáš Czerner
  1 sibling, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2012-11-13 16:42 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: linux-kernel, linux-fsdevel, jmoyer, akpm

> @@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>  		goto out;
>  	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
>  		goto out;
> +	if (lo->lo_bio_count >= q->nr_congestion_on) {
> +		spin_unlock_irq(&lo->lo_lock);
> +		wait_event(lo->lo_req_wait, lo->lo_bio_count <
> +			   q->nr_congestion_off);
> +		spin_lock_irq(&lo->lo_lock);
> +	}

This makes me nervous. You are reading lo_bio_count outside the lock. If
you race with the prepare_to_wait() and condition check in
__wait_event(), then you will sleep forever.

md has private helpers for this, seems it would be a good idea to move
these into the regular wait includes and use them here too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-13 16:42 ` Jens Axboe
@ 2012-11-14  9:02   ` Lukáš Czerner
  2012-11-14 15:21     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2012-11-14  9:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Lukas Czerner, linux-kernel, linux-fsdevel, jmoyer, akpm

On Tue, 13 Nov 2012, Jens Axboe wrote:

> Date: Tue, 13 Nov 2012 09:42:58 -0700
> From: Jens Axboe <axboe@kernel.dk>
> To: Lukas Czerner <lczerner@redhat.com>
> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>     jmoyer@redhat.com, akpm@linux-foundation.org
> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
> 
> > @@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
> >  		goto out;
> >  	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
> >  		goto out;
> > +	if (lo->lo_bio_count >= q->nr_congestion_on) {
> > +		spin_unlock_irq(&lo->lo_lock);
> > +		wait_event(lo->lo_req_wait, lo->lo_bio_count <
> > +			   q->nr_congestion_off);
> > +		spin_lock_irq(&lo->lo_lock);
> > +	}
> 
> This makes me nervous. You are reading lo_bio_count outside the lock. If
> you race with the prepare_to_wait() and condition check in
> __wait_event(), then you will sleep forever.

Hi Jens,

I am sorry for being dense, but I do not see how this would be
possible. The only place we increase the lo_bio_count is after that
piece of code (possibly after the wait). Moreover every time we're
decreasing the lo_bio_count and it is smaller than nr_congestion_off
we will wake_up().

That's how wait_event/wake_up is supposed to be used, right ?

Thanks!
-Lukas

> 
> md has private helpers for this, seems it would be a good idea to move
> these into the regular wait includes and use them here too.
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-14  9:02   ` Lukáš Czerner
@ 2012-11-14 15:21     ` Jens Axboe
  2012-11-15  8:20       ` Lukáš Czerner
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2012-11-14 15:21 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-kernel, linux-fsdevel, jmoyer, akpm

On 2012-11-14 02:02, Lukáš Czerner wrote:
> On Tue, 13 Nov 2012, Jens Axboe wrote:
> 
>> Date: Tue, 13 Nov 2012 09:42:58 -0700
>> From: Jens Axboe <axboe@kernel.dk>
>> To: Lukas Czerner <lczerner@redhat.com>
>> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>>     jmoyer@redhat.com, akpm@linux-foundation.org
>> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
>>
>>> @@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>>>  		goto out;
>>>  	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
>>>  		goto out;
>>> +	if (lo->lo_bio_count >= q->nr_congestion_on) {
>>> +		spin_unlock_irq(&lo->lo_lock);
>>> +		wait_event(lo->lo_req_wait, lo->lo_bio_count <
>>> +			   q->nr_congestion_off);
>>> +		spin_lock_irq(&lo->lo_lock);
>>> +	}
>>
>> This makes me nervous. You are reading lo_bio_count outside the lock. If
>> you race with the prepare_to_wait() and condition check in
>> __wait_event(), then you will sleep forever.
> 
> Hi Jens,
> 
> I am sorry for being dense, but I do not see how this would be
> possible. The only place we increase the lo_bio_count is after that
> piece of code (possibly after the wait). Moreover every time we're
> decreasing the lo_bio_count and it is smaller than nr_congestion_off
> we will wake_up().
> 
> That's how wait_event/wake_up is supposed to be used, right ?

It is, yes. But you are checking the condition without the lock, so you
could be operating on a stale value. The point is, you have to safely
check the condition _after prepare_to_wait() to be completely safe. And
you do not. Either lo_bio_count needs to be atomic, or you need to use a
variant of wait_event() that holds the appropriate lock before
prepare_to_wait() and condition check, then dropping it for the sleep.

See wait_even_lock_irq() in drivers/md/md.h.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-14 15:21     ` Jens Axboe
@ 2012-11-15  8:20       ` Lukáš Czerner
  2012-11-15 14:05         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Lukáš Czerner @ 2012-11-15  8:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Lukáš Czerner, linux-kernel, linux-fsdevel, jmoyer, akpm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2796 bytes --]

On Wed, 14 Nov 2012, Jens Axboe wrote:

> Date: Wed, 14 Nov 2012 08:21:41 -0700
> From: Jens Axboe <axboe@kernel.dk>
> To: Lukáš Czerner <lczerner@redhat.com>
> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>     jmoyer@redhat.com, akpm@linux-foundation.org
> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
> 
> On 2012-11-14 02:02, Lukáš Czerner wrote:
> > On Tue, 13 Nov 2012, Jens Axboe wrote:
> > 
> >> Date: Tue, 13 Nov 2012 09:42:58 -0700
> >> From: Jens Axboe <axboe@kernel.dk>
> >> To: Lukas Czerner <lczerner@redhat.com>
> >> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
> >>     jmoyer@redhat.com, akpm@linux-foundation.org
> >> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
> >>
> >>> @@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
> >>>  		goto out;
> >>>  	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
> >>>  		goto out;
> >>> +	if (lo->lo_bio_count >= q->nr_congestion_on) {
> >>> +		spin_unlock_irq(&lo->lo_lock);
> >>> +		wait_event(lo->lo_req_wait, lo->lo_bio_count <
> >>> +			   q->nr_congestion_off);
> >>> +		spin_lock_irq(&lo->lo_lock);
> >>> +	}
> >>
> >> This makes me nervous. You are reading lo_bio_count outside the lock. If
> >> you race with the prepare_to_wait() and condition check in
> >> __wait_event(), then you will sleep forever.
> > 
> > Hi Jens,
> > 
> > I am sorry for being dense, but I do not see how this would be
> > possible. The only place we increase the lo_bio_count is after that
> > piece of code (possibly after the wait). Moreover every time we're
> > decreasing the lo_bio_count and it is smaller than nr_congestion_off
> > we will wake_up().
> > 
> > That's how wait_event/wake_up is supposed to be used, right ?
> 
> It is, yes. But you are checking the condition without the lock, so you
> could be operating on a stale value. The point is, you have to safely
> check the condition _after prepare_to_wait() to be completely safe. And
> you do not. Either lo_bio_count needs to be atomic, or you need to use a
> variant of wait_event() that holds the appropriate lock before
> prepare_to_wait() and condition check, then dropping it for the sleep.
> 
> See wait_even_lock_irq() in drivers/md/md.h.

Ok I knew that much. So the only possibility to deadlock is when we
would process all the bios in the loop_thread() before the waiting
event would get to checking the condition after which we would read
the stale data where lo_bio_count is still < nr_congestion_off so we
get back to sleep, never to be woken up again. That sounds highly
unlikely. But fair enough, it make sense to make it absolutely bullet
proof.

I'll take a look at the wait_event_lock_irq.

Thanks!
-Lukas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] loop: Limit the number of requests in the bio list
  2012-11-15  8:20       ` Lukáš Czerner
@ 2012-11-15 14:05         ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2012-11-15 14:05 UTC (permalink / raw)
  To: Lukáš Czerner; +Cc: linux-kernel, linux-fsdevel, jmoyer, akpm

On 2012-11-15 01:20, Lukáš Czerner wrote:
> On Wed, 14 Nov 2012, Jens Axboe wrote:
> 
>> Date: Wed, 14 Nov 2012 08:21:41 -0700
>> From: Jens Axboe <axboe@kernel.dk>
>> To: Lukáš Czerner <lczerner@redhat.com>
>> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>>     jmoyer@redhat.com, akpm@linux-foundation.org
>> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
>>
>> On 2012-11-14 02:02, Lukáš Czerner wrote:
>>> On Tue, 13 Nov 2012, Jens Axboe wrote:
>>>
>>>> Date: Tue, 13 Nov 2012 09:42:58 -0700
>>>> From: Jens Axboe <axboe@kernel.dk>
>>>> To: Lukas Czerner <lczerner@redhat.com>
>>>> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>>>>     jmoyer@redhat.com, akpm@linux-foundation.org
>>>> Subject: Re: [PATCH v3] loop: Limit the number of requests in the bio list
>>>>
>>>>> @@ -489,6 +491,12 @@ static void loop_make_request(struct request_queue *q, struct bio *old_bio)
>>>>>  		goto out;
>>>>>  	if (unlikely(rw == WRITE && (lo->lo_flags & LO_FLAGS_READ_ONLY)))
>>>>>  		goto out;
>>>>> +	if (lo->lo_bio_count >= q->nr_congestion_on) {
>>>>> +		spin_unlock_irq(&lo->lo_lock);
>>>>> +		wait_event(lo->lo_req_wait, lo->lo_bio_count <
>>>>> +			   q->nr_congestion_off);
>>>>> +		spin_lock_irq(&lo->lo_lock);
>>>>> +	}
>>>>
>>>> This makes me nervous. You are reading lo_bio_count outside the lock. If
>>>> you race with the prepare_to_wait() and condition check in
>>>> __wait_event(), then you will sleep forever.
>>>
>>> Hi Jens,
>>>
>>> I am sorry for being dense, but I do not see how this would be
>>> possible. The only place we increase the lo_bio_count is after that
>>> piece of code (possibly after the wait). Moreover every time we're
>>> decreasing the lo_bio_count and it is smaller than nr_congestion_off
>>> we will wake_up().
>>>
>>> That's how wait_event/wake_up is supposed to be used, right ?
>>
>> It is, yes. But you are checking the condition without the lock, so you
>> could be operating on a stale value. The point is, you have to safely
>> check the condition _after prepare_to_wait() to be completely safe. And
>> you do not. Either lo_bio_count needs to be atomic, or you need to use a
>> variant of wait_event() that holds the appropriate lock before
>> prepare_to_wait() and condition check, then dropping it for the sleep.
>>
>> See wait_even_lock_irq() in drivers/md/md.h.
> 
> Ok I knew that much. So the only possibility to deadlock is when we
> would process all the bios in the loop_thread() before the waiting
> event would get to checking the condition after which we would read
> the stale data where lo_bio_count is still < nr_congestion_off so we
> get back to sleep, never to be woken up again. That sounds highly
> unlikely. But fair enough, it make sense to make it absolutely bullet
> proof.

It depends on the settings. At the current depth/batch count, yes,
unlikely. But sometimes "highly unlikely" scenarios turn out to be
hitting all the time for person X's setup and settings.

> I'll take a look at the wait_event_lock_irq.

Thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-11-15 14:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-13 16:27 [PATCH v3] loop: Limit the number of requests in the bio list Lukas Czerner
2012-11-13 16:35 ` Jeff Moyer
2012-11-13 16:42 ` Jens Axboe
2012-11-14  9:02   ` Lukáš Czerner
2012-11-14 15:21     ` Jens Axboe
2012-11-15  8:20       ` Lukáš Czerner
2012-11-15 14:05         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).