* Bounce buffer deadlock @ 2001-06-29 20:33 Steve Lord 2001-06-29 20:42 ` Alan Cox 2001-06-30 15:30 ` Marcelo Tosatti 0 siblings, 2 replies; 13+ messages in thread From: Steve Lord @ 2001-06-29 20:33 UTC (permalink / raw) To: linux-kernel Has anyone else seen a hang like this: bdflush() flush_dirty_buffers() ll_rw_block() submit_bh(buffer X) generic_make_request() __make_request() create_bounce() alloc_bounce_page() alloc_page() try_to_free_pages() do_try_to_free_pages() page_launder() try_to_free_buffers( , 2) -- i.e. wait for buffers sync_page_buffers() __wait_on_buffer(buffer X) Where the buffer head X going in the top of the stack is the same as the one we wait on at the bottom. There still seems to be nothing to prevent the try to free buffers from blocking on a buffer like this. Setting a flag on the buffer around the create_bounce call, and skipping it in the try_to_free_buffers path would be one approach to avoiding this. I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect against it. Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-29 20:33 Bounce buffer deadlock Steve Lord @ 2001-06-29 20:42 ` Alan Cox 2001-06-30 15:30 ` Marcelo Tosatti 1 sibling, 0 replies; 13+ messages in thread From: Alan Cox @ 2001-06-29 20:42 UTC (permalink / raw) To: Steve Lord; +Cc: linux-kernel > Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) Whoops. We actually carefully set PF_MEMALLOC to avoid deadlocks here but if the buffer is laundered.... ummm Looks like page_launder needs to be more careful > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > against it. Not this one no Alan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-29 20:33 Bounce buffer deadlock Steve Lord 2001-06-29 20:42 ` Alan Cox @ 2001-06-30 15:30 ` Marcelo Tosatti 2001-06-30 15:46 ` Marcelo Tosatti ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: Marcelo Tosatti @ 2001-06-30 15:30 UTC (permalink / raw) To: Steve Lord, Linus Torvalds; +Cc: lkml On Fri, 29 Jun 2001, Steve Lord wrote: > > Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) > > Where the buffer head X going in the top of the stack is the same as the one > we wait on at the bottom. > > There still seems to be nothing to prevent the try to free buffers from > blocking on a buffer like this. Setting a flag on the buffer around the > create_bounce call, and skipping it in the try_to_free_buffers path would > be one approach to avoiding this. Yes there is a bug: Linus is going to put a new fix soon. > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > against it. Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO moditications which are in the -ac series. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 15:30 ` Marcelo Tosatti @ 2001-06-30 15:46 ` Marcelo Tosatti 2001-06-30 17:32 ` Linus Torvalds 2001-06-30 17:34 ` Steve Lord 2 siblings, 0 replies; 13+ messages in thread From: Marcelo Tosatti @ 2001-06-30 15:46 UTC (permalink / raw) To: Steve Lord, Linus Torvalds; +Cc: lkml On Sat, 30 Jun 2001, Marcelo Tosatti wrote: > > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > Has anyone else seen a hang like this: > > > > bdflush() > > flush_dirty_buffers() > > ll_rw_block() > > submit_bh(buffer X) > > generic_make_request() > > __make_request() > > create_bounce() > > alloc_bounce_page() > > alloc_page() > > try_to_free_pages() > > do_try_to_free_pages() > > page_launder() > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > sync_page_buffers() > > __wait_on_buffer(buffer X) > > > > Where the buffer head X going in the top of the stack is the same as the one > > we wait on at the bottom. > > > > There still seems to be nothing to prevent the try to free buffers from > > blocking on a buffer like this. Setting a flag on the buffer around the > > create_bounce call, and skipping it in the try_to_free_buffers path would > > be one approach to avoiding this. > > Yes there is a bug: Linus is going to put a new fix soon. > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to protect > > against it. > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > moditications which are in the -ac series. ^^ I mean 2.4.6-pre, sorry. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 15:30 ` Marcelo Tosatti 2001-06-30 15:46 ` Marcelo Tosatti @ 2001-06-30 17:32 ` Linus Torvalds 2001-06-30 17:34 ` Steve Lord 2 siblings, 0 replies; 13+ messages in thread From: Linus Torvalds @ 2001-06-30 17:32 UTC (permalink / raw) To: Steve Lord; +Cc: lkml On Fri, 29 Jun 2001, Steve Lord wrote: > > Has anyone else seen a hang like this: > > bdflush() > flush_dirty_buffers() > ll_rw_block() > submit_bh(buffer X) > generic_make_request() > __make_request() > create_bounce() > alloc_bounce_page() > alloc_page() > try_to_free_pages() > do_try_to_free_pages() > page_launder() > try_to_free_buffers( , 2) -- i.e. wait for buffers > sync_page_buffers() > __wait_on_buffer(buffer X) Should be fixed in 2.4.6-pre8 (or pre7, for that matter). Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 15:30 ` Marcelo Tosatti 2001-06-30 15:46 ` Marcelo Tosatti 2001-06-30 17:32 ` Linus Torvalds @ 2001-06-30 17:34 ` Steve Lord 2001-06-30 16:07 ` Marcelo Tosatti 2001-06-30 17:39 ` Linus Torvalds 2 siblings, 2 replies; 13+ messages in thread From: Steve Lord @ 2001-06-30 17:34 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Steve Lord, Linus Torvalds, lkml > > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > Has anyone else seen a hang like this: > > > > bdflush() > > flush_dirty_buffers() > > ll_rw_block() > > submit_bh(buffer X) > > generic_make_request() > > __make_request() > > create_bounce() > > alloc_bounce_page() > > alloc_page() > > try_to_free_pages() > > do_try_to_free_pages() > > page_launder() > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > sync_page_buffers() > > __wait_on_buffer(buffer X) > > > > Where the buffer head X going in the top of the stack is the same as the on > e > > we wait on at the bottom. > > > > There still seems to be nothing to prevent the try to free buffers from > > blocking on a buffer like this. Setting a flag on the buffer around the > > create_bounce call, and skipping it in the try_to_free_buffers path would > > be one approach to avoiding this. > > Yes there is a bug: Linus is going to put a new fix soon. > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot > ect > > against it. > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > moditications which are in the -ac series. It looks to me as if all memory allocations of type GFP_BUFFER which happen in generic_make_request downwards can hit the same type of deadlock, so bounce buffers, the request functions of the raid and lvm paths can all end up in try_to_free_buffers on a buffer they themselves hold the lock on. If the fix is to avoid page_launder in these cases then the number of occurrences when an alloc_pages fails will go up. I was attempting to come up with a way of making try_to_free_buffers fail on buffers which are being processed in the generic_make_request path by marking them, the problem is there is no single place to reset the state of a buffer so that try_to_free_buffers will wait for it. Doing it after the end of the loop in generic_make_request is race prone to say the least. Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 17:34 ` Steve Lord @ 2001-06-30 16:07 ` Marcelo Tosatti 2001-06-30 18:07 ` Steve Lord 2001-06-30 17:39 ` Linus Torvalds 1 sibling, 1 reply; 13+ messages in thread From: Marcelo Tosatti @ 2001-06-30 16:07 UTC (permalink / raw) To: Steve Lord; +Cc: Linus Torvalds, lkml On Sat, 30 Jun 2001, Steve Lord wrote: > > > > > > On Fri, 29 Jun 2001, Steve Lord wrote: > > > > > > > > Has anyone else seen a hang like this: > > > > > > bdflush() > > > flush_dirty_buffers() > > > ll_rw_block() > > > submit_bh(buffer X) > > > generic_make_request() > > > __make_request() > > > create_bounce() > > > alloc_bounce_page() > > > alloc_page() > > > try_to_free_pages() > > > do_try_to_free_pages() > > > page_launder() > > > try_to_free_buffers( , 2) -- i.e. wait for buffers > > > sync_page_buffers() > > > __wait_on_buffer(buffer X) > > > > > > Where the buffer head X going in the top of the stack is the same as the on > > e > > > we wait on at the bottom. > > > > > > There still seems to be nothing to prevent the try to free buffers from > > > blocking on a buffer like this. Setting a flag on the buffer around the > > > create_bounce call, and skipping it in the try_to_free_buffers path would > > > be one approach to avoiding this. > > > > Yes there is a bug: Linus is going to put a new fix soon. > > > > > I hit this in 2.4.6-pre6, and I don't see anything in the ac series to prot > > ect > > > against it. > > > > Thats because the -ac series does not contain the __GFP_BUFFER/__GFP_IO > > moditications which are in the -ac series. > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > in generic_make_request downwards can hit the same type of deadlock, > bounce buffers, the request functions of the raid and lvm paths can all > end up in try_to_free_buffers on a buffer they themselves hold the lock on. Yes. 2.4.6-pre8 fixes that (not sure if its up already). > If the fix is to avoid page_launder in these cases then the number of > occurrences when an alloc_pages fails will go up. > I was attempting to come up with a way of making try_to_free_buffers > fail on buffers which are being processed in the generic_make_request > path by marking them, the problem is there is no single place to reset > the state of a buffer so that try_to_free_buffers will wait for it. > Doing it after the end of the loop in generic_make_request is race > prone to say the least. I really want to fix things like this in 2.5. (ie not avoid the deadlock by completly avoiding physical IO, but avoid the deadlock by avoiding physical IO on the "device" which is doing the allocation) Could you send me your code ? No problem if it does not work at all :) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 16:07 ` Marcelo Tosatti @ 2001-06-30 18:07 ` Steve Lord 2001-06-30 18:29 ` Marcelo Tosatti 0 siblings, 1 reply; 13+ messages in thread From: Steve Lord @ 2001-06-30 18:07 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Steve Lord, Linus Torvalds, lkml > Yes. 2.4.6-pre8 fixes that (not sure if its up already). It is up. > > > If the fix is to avoid page_launder in these cases then the number of > > occurrences when an alloc_pages fails will go up. > > > I was attempting to come up with a way of making try_to_free_buffers > > fail on buffers which are being processed in the generic_make_request > > path by marking them, the problem is there is no single place to reset > > the state of a buffer so that try_to_free_buffers will wait for it. > > Doing it after the end of the loop in generic_make_request is race > > prone to say the least. > > I really want to fix things like this in 2.5. (ie not avoid the deadlock > by completly avoiding physical IO, but avoid the deadlock by avoiding > physical IO on the "device" which is doing the allocation) > > Could you send me your code ? No problem if it does not work at all :) > Well, the basic idea is simple, but I suspect the implementation might rapidly become historical in 2.5. Basically I added a new buffer state bit, although BH_Req looks like it could be cannibalized, no one appears to check for it (is it really dead code?). Using a flag to skip buffers in try_to_free_buffers is easy: =========================================================================== Index: linux/fs/buffer.c =========================================================================== --- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68 Sat Jun 30 12:56:29 2001 +++ linux/fs/buffer.c Sat Jun 30 12:57:52 2001 @@ -2365,7 +2365,7 @@ /* * Can the buffer be thrown out? */ -#define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected)) +#define BUFFER_BUSY_BITS ((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected) | (1<<BH_Clamped)) #define buffer_busy(bh) (atomic_read(&(bh)->b_count) | ((bh)->b_state & BUFFER_BUSY_BITS)) /* @@ -2430,7 +2430,11 @@ spin_unlock(&free_list[index].lock); write_unlock(&hash_table_lock); spin_unlock(&lru_list_lock); - if (wait) { + /* Buffers in the middle of generic_make_request processing cannot + * be waited for, they may be allocating memory right now and be + * locked by this thread. + */ + if (wait && !buffer_clamped(tmp)) { sync_page_buffers(bh, wait); /* We waited synchronously, so we can free the buffers. */ if (wait > 1 && !loop) { =========================================================================== Index: linux/include/linux/fs.h =========================================================================== --- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99 Sat Jun 30 12:56:29 2001 +++ linux/include/linux/fs.h Sat Jun 30 07:05:37 2001 @@ -224,6 +224,8 @@ BH_Mapped, /* 1 if the buffer has a disk mapping */ BH_New, /* 1 if the buffer is new and not yet written out */ BH_Protected, /* 1 if the buffer is protected */ + BH_Clamped, /* 1 if the buffer cannot be reclaimed + * in it's current state */ BH_Delay, /* 1 if the buffer is delayed allocate */ BH_PrivateStart,/* not a state bit, but the first bit available @@ -286,6 +288,7 @@ #define buffer_mapped(bh) __buffer_state(bh,Mapped) #define buffer_new(bh) __buffer_state(bh,New) #define buffer_protected(bh) __buffer_state(bh,Protected) +#define buffer_clamped(bh) __buffer_state(bh,Clamped) #define buffer_delay(bh) __buffer_state(bh,Delay) #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) The tricky part which I had not worked out how to do yet is to manage the clearing of a state bit in all the correct places. You would have to set it when the buffer got locked when I/O was about to start, it becomes clearable after the last memory allocation during the I/O submission process. I do not like the approach because there are so many ways a buffer can go once you get into generic_make_request. At first I thought I could just explicitly set and clear a flag around memory allocations like the bounce buffer path. However, that can lead to AB BA deadlocks between multiple threads submitting I/O requests. At this point I started to think I was going to build an unmaintainable rats nest and decided I had not got the correct answer. I am not sure that an approach which avoids a specific device will fly either, all the I/O can be on one device, and what does device mean when it comes to md/lvm and request remapping? Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 18:07 ` Steve Lord @ 2001-06-30 18:29 ` Marcelo Tosatti 0 siblings, 0 replies; 13+ messages in thread From: Marcelo Tosatti @ 2001-06-30 18:29 UTC (permalink / raw) To: Steve Lord; +Cc: Linus Torvalds, lkml On Sat, 30 Jun 2001, Steve Lord wrote: > > > Yes. 2.4.6-pre8 fixes that (not sure if its up already). > > It is up. > > > > > > If the fix is to avoid page_launder in these cases then the number of > > > occurrences when an alloc_pages fails will go up. > > > > > I was attempting to come up with a way of making try_to_free_buffers > > > fail on buffers which are being processed in the generic_make_request > > > path by marking them, the problem is there is no single place to reset > > > the state of a buffer so that try_to_free_buffers will wait for it. > > > Doing it after the end of the loop in generic_make_request is race > > > prone to say the least. > > > > I really want to fix things like this in 2.5. (ie not avoid the deadlock > > by completly avoiding physical IO, but avoid the deadlock by avoiding > > physical IO on the "device" which is doing the allocation) > > > > Could you send me your code ? No problem if it does not work at all :) > > > > Well, the basic idea is simple, but I suspect the implementation might > rapidly become historical in 2.5. Basically I added a new buffer state bit, > although BH_Req looks like it could be cannibalized, no one appears to check > for it (is it really dead code?). Dunno. Ask Jens :) > Using a flag to skip buffers in try_to_free_buffers is easy: I was thinking more about skipping the buffers which are "owned" by the device which is doing the allocation. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 17:34 ` Steve Lord 2001-06-30 16:07 ` Marcelo Tosatti @ 2001-06-30 17:39 ` Linus Torvalds 2001-06-30 17:48 ` Steve Lord 1 sibling, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2001-06-30 17:39 UTC (permalink / raw) To: Steve Lord; +Cc: Marcelo Tosatti, lkml On Sat, 30 Jun 2001, Steve Lord wrote: > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > in generic_make_request downwards can hit the same type of deadlock, so > bounce buffers, the request functions of the raid and lvm paths can all > end up in try_to_free_buffers on a buffer they themselves hold the lock on. .. which is why GFP_BUFFER doesn't exist any more in the most recent pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous email) The problem is that GFP_BUFFER used to mean two things: "don't call low-level filesystem" and "don't do IO". Some of the pre-kernels starting to make it mean "don't call low-level FS" only. The later ones split up the semantics, so that the cases which care about FS deadlocks use "GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so that we don't overload the meaning of GFP_BUFFER. That allows us to do the best we can - still flushing out dirty buffers when that's ok (like when a filesystem wants more memory), and giving the allocator better control over exactly _what_ he objects to. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 17:39 ` Linus Torvalds @ 2001-06-30 17:48 ` Steve Lord 2001-06-30 17:50 ` Linus Torvalds 0 siblings, 1 reply; 13+ messages in thread From: Steve Lord @ 2001-06-30 17:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: Steve Lord, Marcelo Tosatti, lkml > > On Sat, 30 Jun 2001, Steve Lord wrote: > > > > It looks to me as if all memory allocations of type GFP_BUFFER which happen > > in generic_make_request downwards can hit the same type of deadlock, so > > bounce buffers, the request functions of the raid and lvm paths can all > > end up in try_to_free_buffers on a buffer they themselves hold the lock on. > > .. which is why GFP_BUFFER doesn't exist any more in the most recent > pre-kernels (oops, this is pre8 only, not pre7 like I said in the previous > email) > > The problem is that GFP_BUFFER used to mean two things: "don't call > low-level filesystem" and "don't do IO". Some of the pre-kernels starting > to make it mean "don't call low-level FS" only. The later ones split up > the semantics, so that the cases which care about FS deadlocks use > "GFP_NOFS", and the cases that care about IO recursion use "GFP_NOIO", so > that we don't overload the meaning of GFP_BUFFER. > > That allows us to do the best we can - still flushing out dirty buffers > when that's ok (like when a filesystem wants more memory), and giving the > allocator better control over exactly _what_ he objects to. > > Linus OK, sounds reasonable, time to go download and merge again I guess! Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 17:48 ` Steve Lord @ 2001-06-30 17:50 ` Linus Torvalds 2001-06-30 18:23 ` Steve Lord 0 siblings, 1 reply; 13+ messages in thread From: Linus Torvalds @ 2001-06-30 17:50 UTC (permalink / raw) To: Steve Lord; +Cc: Marcelo Tosatti, lkml On Sat, 30 Jun 2001, Steve Lord wrote: > > OK, sounds reasonable, time to go download and merge again I guess! For 2.4.7 or so, I'll make a backwards-compatibility define (ie make GFP_BUFFER be the same as the new GFP_NOIO, which is the historical behaviour and the anally safe value, if not very efficient), but I'm planning on releasing 2.4.6 without it, to try to flush out people who are able to take advantage of the new extended semantics out of the woodworks.. Linus ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Bounce buffer deadlock 2001-06-30 17:50 ` Linus Torvalds @ 2001-06-30 18:23 ` Steve Lord 0 siblings, 0 replies; 13+ messages in thread From: Steve Lord @ 2001-06-30 18:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Marcelo Tosatti, lkml > > On Sat, 30 Jun 2001, Steve Lord wrote: > > > > OK, sounds reasonable, time to go download and merge again I guess! > > For 2.4.7 or so, I'll make a backwards-compatibility define (ie make > GFP_BUFFER be the same as the new GFP_NOIO, which is the historical > behaviour and the anally safe value, if not very efficient), but I'm > planning on releasing 2.4.6 without it, to try to flush out people who are > able to take advantage of the new extended semantics out of the > woodworks.. > > Linus Consider XFS flushed out (once I merge). This, for us, is the tricky part: torvalds@transmeta.com said: >> That allows us to do the best we can - still flushing out dirty >> buffers when that's ok (like when a filesystem wants more memory), and >> giving the allocator better control over exactly _what_ he objects to. XFS introduces the concept of the low level flush of a buffer not always being really a low level flush, since a delayed allocate buffer can end up reentering the filesystem in order to create the true on disk allocation. This in turn can cause a transaction and more memory allocations. The really nasty case we were using GFP_BUFFER for is a memory allocation which is from within a transaction, we cannot afford to have another transaction start out of the bottom of memory allocation as it may require resources locked by the transaction which is allocating memory. Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2001-06-30 20:02 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-06-29 20:33 Bounce buffer deadlock Steve Lord 2001-06-29 20:42 ` Alan Cox 2001-06-30 15:30 ` Marcelo Tosatti 2001-06-30 15:46 ` Marcelo Tosatti 2001-06-30 17:32 ` Linus Torvalds 2001-06-30 17:34 ` Steve Lord 2001-06-30 16:07 ` Marcelo Tosatti 2001-06-30 18:07 ` Steve Lord 2001-06-30 18:29 ` Marcelo Tosatti 2001-06-30 17:39 ` Linus Torvalds 2001-06-30 17:48 ` Steve Lord 2001-06-30 17:50 ` Linus Torvalds 2001-06-30 18:23 ` Steve Lord
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).