* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() [not found] ` <alpine.DEB.2.10.1705181338090.132717@chino.kir.corp.google.com> @ 2017-05-19 2:50 ` Junaid Shahid 2017-05-19 7:46 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Junaid Shahid @ 2017-05-19 2:50 UTC (permalink / raw) To: David Rientjes Cc: Michal Hocko, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, mpatocka, vbabka, linux-kernel (Adding back the correct linux-mm email address and also adding linux-kernel.) On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > On Thu, 18 May 2017, Michal Hocko wrote: > > > On Thu 18-05-17 11:50:40, Junaid Shahid wrote: > > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded > > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to > > > kvmalloc. This can cause the IOCTL to fail in some low memory situations > > > where it wouldn't have failed earlier. This patch adds it back to avoid > > > any potential regression. > > > > The code previously used __GFP_HIGH only for the vmalloc fallback and > > that doesn't make that much sense with the current implementation > > because vmalloc does order-0 pages and those do not really fail and the > > oom killer is invoked to free memory. > > > > Order-0 pages certainly do fail, there is not an infinite amount of memory > nor is there a specific exemption to allow order-0 memory to be alloctable > below watermarks without this gfp flag. OOM kill is the last thing we > want for these allocations since they are very temporary. > > > There is no reason to access memory reserves from this context. > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > assuming there was a reason to do it in the first place in two different > ways. > > This decision is up to the device mapper maintainers. > > > > Signed-off-by: Junaid Shahid <junaids@google.com> > > > --- > > > drivers/md/dm-ioctl.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c > > > index 0555b4410e05..bacad7637a56 100644 > > > --- a/drivers/md/dm-ioctl.c > > > +++ b/drivers/md/dm-ioctl.c > > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern > > > */ > > > dmi = NULL; > > > noio_flag = memalloc_noio_save(); > > > - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL); > > > + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH); > > > memalloc_noio_restore(noio_flag); > > > > > > if (!dmi) { ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-19 2:50 ` [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() Junaid Shahid @ 2017-05-19 7:46 ` Michal Hocko 2017-05-19 23:43 ` Mikulas Patocka 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2017-05-19 7:46 UTC (permalink / raw) To: Junaid Shahid Cc: David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, mpatocka, vbabka, linux-kernel On Thu 18-05-17 19:50:46, Junaid Shahid wrote: > (Adding back the correct linux-mm email address and also adding linux-kernel.) > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: [...] > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > > assuming there was a reason to do it in the first place in two different > > ways. Hmm, the old PF_MEMALLOC used to have the following comment /* * Trying to avoid low memory issues when a device is * suspended. */ I am not really sure what that means but __GFP_HIGH certainly have a different semantic than PF_MEMALLOC. The later grants the full access to the memory reserves while the prior on partial access. If this is _really_ needed then it deserves a comment explaining why. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-19 7:46 ` Michal Hocko @ 2017-05-19 23:43 ` Mikulas Patocka 2017-05-22 9:37 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Mikulas Patocka @ 2017-05-19 23:43 UTC (permalink / raw) To: Michal Hocko Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Fri, 19 May 2017, Michal Hocko wrote: > On Thu 18-05-17 19:50:46, Junaid Shahid wrote: > > (Adding back the correct linux-mm email address and also adding linux-kernel.) > > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > [...] > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > > > assuming there was a reason to do it in the first place in two different > > > ways. > > Hmm, the old PF_MEMALLOC used to have the following comment > /* > * Trying to avoid low memory issues when a device is > * suspended. > */ > > I am not really sure what that means but __GFP_HIGH certainly have a > different semantic than PF_MEMALLOC. The later grants the full access to > the memory reserves while the prior on partial access. If this is _really_ > needed then it deserves a comment explaining why. > -- > Michal Hocko > SUSE Labs Sometimes, I/O to a device mapper device is blocked until the userspace daemon dmeventd does some action (for example, when dm-mirror leg fails, dmeventd needs to mark the leg as failed in the lvm metadata and then reload the device). The dmeventd daemon mlocks itself in memory so that it doesn't generate any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that the ioctls issued by dmeventd have higher chance of succeeding if some I/O is blocked, waiting for dmeventd action. It reduces the possibility of low-memory-deadlock, though it doesn't eliminate it entirely. Mikulas ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-19 23:43 ` Mikulas Patocka @ 2017-05-22 9:37 ` Michal Hocko 2017-05-22 12:00 ` Mikulas Patocka 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2017-05-22 9:37 UTC (permalink / raw) To: Mikulas Patocka Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Fri 19-05-17 19:43:23, Mikulas Patocka wrote: > > > On Fri, 19 May 2017, Michal Hocko wrote: > > > On Thu 18-05-17 19:50:46, Junaid Shahid wrote: > > > (Adding back the correct linux-mm email address and also adding linux-kernel.) > > > > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > > [...] > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > > > > assuming there was a reason to do it in the first place in two different > > > > ways. > > > > Hmm, the old PF_MEMALLOC used to have the following comment > > /* > > * Trying to avoid low memory issues when a device is > > * suspended. > > */ > > > > I am not really sure what that means but __GFP_HIGH certainly have a > > different semantic than PF_MEMALLOC. The later grants the full access to > > the memory reserves while the prior on partial access. If this is _really_ > > needed then it deserves a comment explaining why. > > -- > > Michal Hocko > > SUSE Labs > > Sometimes, I/O to a device mapper device is blocked until the userspace > daemon dmeventd does some action (for example, when dm-mirror leg fails, > dmeventd needs to mark the leg as failed in the lvm metadata and then > reload the device). > > The dmeventd daemon mlocks itself in memory so that it doesn't generate > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that > the ioctls issued by dmeventd have higher chance of succeeding if some I/O > is blocked, waiting for dmeventd action. It reduces the possibility of > low-memory-deadlock, though it doesn't eliminate it entirely. So what happens if the memory reserves are depleted. Do we deadlock? Why is OOM killer insufficient to allow the further progress? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 9:37 ` Michal Hocko @ 2017-05-22 12:00 ` Mikulas Patocka 2017-05-22 12:09 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Mikulas Patocka @ 2017-05-22 12:00 UTC (permalink / raw) To: Michal Hocko Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon, 22 May 2017, Michal Hocko wrote: > On Fri 19-05-17 19:43:23, Mikulas Patocka wrote: > > > > > > On Fri, 19 May 2017, Michal Hocko wrote: > > > > > On Thu 18-05-17 19:50:46, Junaid Shahid wrote: > > > > (Adding back the correct linux-mm email address and also adding linux-kernel.) > > > > > > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > > > [...] > > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > > > > > assuming there was a reason to do it in the first place in two different > > > > > ways. > > > > > > Hmm, the old PF_MEMALLOC used to have the following comment > > > /* > > > * Trying to avoid low memory issues when a device is > > > * suspended. > > > */ > > > > > > I am not really sure what that means but __GFP_HIGH certainly have a > > > different semantic than PF_MEMALLOC. The later grants the full access to > > > the memory reserves while the prior on partial access. If this is _really_ > > > needed then it deserves a comment explaining why. > > > -- > > > Michal Hocko > > > SUSE Labs > > > > Sometimes, I/O to a device mapper device is blocked until the userspace > > daemon dmeventd does some action (for example, when dm-mirror leg fails, > > dmeventd needs to mark the leg as failed in the lvm metadata and then > > reload the device). > > > > The dmeventd daemon mlocks itself in memory so that it doesn't generate > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O > > is blocked, waiting for dmeventd action. It reduces the possibility of > > low-memory-deadlock, though it doesn't eliminate it entirely. > > So what happens if the memory reserves are depleted. Do we deadlock? Yes, it will deadlock. > Why is OOM killer insufficient to allow the further progress? I don't know if the OOM killer will or won't be triggered in this situation, it depends on the people who wrote the OOM killer. > -- > Michal Hocko > SUSE Labs Mikulas ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 12:00 ` Mikulas Patocka @ 2017-05-22 12:09 ` Michal Hocko 2017-05-22 14:52 ` Mikulas Patocka 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2017-05-22 12:09 UTC (permalink / raw) To: Mikulas Patocka Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon 22-05-17 08:00:11, Mikulas Patocka wrote: > > > On Mon, 22 May 2017, Michal Hocko wrote: > > > On Fri 19-05-17 19:43:23, Mikulas Patocka wrote: > > > > > > > > > On Fri, 19 May 2017, Michal Hocko wrote: > > > > > > > On Thu 18-05-17 19:50:46, Junaid Shahid wrote: > > > > > (Adding back the correct linux-mm email address and also adding linux-kernel.) > > > > > > > > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote: > > > > [...] > > > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, > > > > > > assuming there was a reason to do it in the first place in two different > > > > > > ways. > > > > > > > > Hmm, the old PF_MEMALLOC used to have the following comment > > > > /* > > > > * Trying to avoid low memory issues when a device is > > > > * suspended. > > > > */ > > > > > > > > I am not really sure what that means but __GFP_HIGH certainly have a > > > > different semantic than PF_MEMALLOC. The later grants the full access to > > > > the memory reserves while the prior on partial access. If this is _really_ > > > > needed then it deserves a comment explaining why. > > > > -- > > > > Michal Hocko > > > > SUSE Labs > > > > > > Sometimes, I/O to a device mapper device is blocked until the userspace > > > daemon dmeventd does some action (for example, when dm-mirror leg fails, > > > dmeventd needs to mark the leg as failed in the lvm metadata and then > > > reload the device). > > > > > > The dmeventd daemon mlocks itself in memory so that it doesn't generate > > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that > > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O > > > is blocked, waiting for dmeventd action. It reduces the possibility of > > > low-memory-deadlock, though it doesn't eliminate it entirely. > > > > So what happens if the memory reserves are depleted. Do we deadlock? > > Yes, it will deadlock. That would be more than unfortunate and begs for a different solution. The thing is that __GFP_HIGH is not propagated to all allocations in the vmalloc proper. E.g. page table allocations are hardcoded GFP_KERNEL. > > Why is OOM killer insufficient to allow the further progress? > > I don't know if the OOM killer will or won't be triggered in this > situation, it depends on the people who wrote the OOM killer. I am not sure I understand. OOM killer is invoked for _all_ allocations <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the OOM killer is not disabled (oom_killer_disable) and that only happens from the PM suspend path which makes sure that no userspace is active at the time. AFAIU this is a userspace triggered path and so the later shouldn't apply to it and GFP_KERNEL should be therefore sufficient. Relying to a portion of memory reserves to prevent from deadlock seems fundamentaly broken to me. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 12:09 ` Michal Hocko @ 2017-05-22 14:52 ` Mikulas Patocka 2017-05-22 15:03 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Mikulas Patocka @ 2017-05-22 14:52 UTC (permalink / raw) To: Michal Hocko Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon, 22 May 2017, Michal Hocko wrote: > On Mon 22-05-17 08:00:11, Mikulas Patocka wrote: > > > > On Mon, 22 May 2017, Michal Hocko wrote: > > > > > > Sometimes, I/O to a device mapper device is blocked until the userspace > > > > daemon dmeventd does some action (for example, when dm-mirror leg fails, > > > > dmeventd needs to mark the leg as failed in the lvm metadata and then > > > > reload the device). > > > > > > > > The dmeventd daemon mlocks itself in memory so that it doesn't generate > > > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that > > > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O > > > > is blocked, waiting for dmeventd action. It reduces the possibility of > > > > low-memory-deadlock, though it doesn't eliminate it entirely. > > > > > > So what happens if the memory reserves are depleted. Do we deadlock? > > > > Yes, it will deadlock. > > That would be more than unfortunate and begs for a different solution. > The thing is that __GFP_HIGH is not propagated to all allocations in the > vmalloc proper. E.g. page table allocations are hardcoded GFP_KERNEL. For a typical device mapper use, the ioctl area is smaller than 4k, so the vmalloc won't happen. > > > Why is OOM killer insufficient to allow the further progress? > > > > I don't know if the OOM killer will or won't be triggered in this > > situation, it depends on the people who wrote the OOM killer. > > I am not sure I understand. OOM killer is invoked for _all_ allocations > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the > OOM killer is not disabled (oom_killer_disable) and that only happens > from the PM suspend path which makes sure that no userspace is active at > the time. AFAIU this is a userspace triggered path and so the later > shouldn't apply to it and GFP_KERNEL should be therefore sufficient. > Relying to a portion of memory reserves to prevent from deadlock seems > fundamentaly broken to me. > > -- > Michal Hocko > SUSE Labs The lvm2 was designed this way - it is broken, but there is not much that can be done about it - fixing this would mean major rewrite. The only thing we can do about it is to lower the deadlock probability with __GFP_HIGH (or PF_MEMALLOC that was used some times ago). Mikulas ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 14:52 ` Mikulas Patocka @ 2017-05-22 15:03 ` Michal Hocko 2017-05-22 18:04 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Michal Hocko @ 2017-05-22 15:03 UTC (permalink / raw) To: Mikulas Patocka Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon 22-05-17 10:52:44, Mikulas Patocka wrote: > > > On Mon, 22 May 2017, Michal Hocko wrote: [...] > > I am not sure I understand. OOM killer is invoked for _all_ allocations > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the > > OOM killer is not disabled (oom_killer_disable) and that only happens > > from the PM suspend path which makes sure that no userspace is active at > > the time. AFAIU this is a userspace triggered path and so the later > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient. > > Relying to a portion of memory reserves to prevent from deadlock seems > > fundamentaly broken to me. > > > > The lvm2 was designed this way - it is broken, but there is not much that > can be done about it - fixing this would mean major rewrite. The only > thing we can do about it is to lower the deadlock probability with > __GFP_HIGH (or PF_MEMALLOC that was used some times ago). But let me repeat. GFP_KERNEL allocation for order-0 page will not fail. If you need non-failing semantic then just make it clear by adding __GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce resource and there are users which might really need it from atomic contexts. Anyway, this is not the code I am maintaining so I will not argue more and won't nack the patch. But is smells like a pure cargo cult, to be honest. If you really insist, though, I would just ask to have a more detailed explanation why it is _believed_ the flag is needed because the vague "Use __GFP_HIGH to avoid low memory issues when a device is suspended and the ioctl is needed to resume it." doesn't really clarify much to be honest. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 15:03 ` Michal Hocko @ 2017-05-22 18:04 ` Mike Snitzer 2017-05-22 20:35 ` David Rientjes 2017-05-23 6:49 ` Michal Hocko 0 siblings, 2 replies; 16+ messages in thread From: Mike Snitzer @ 2017-05-22 18:04 UTC (permalink / raw) To: Michal Hocko Cc: Mikulas Patocka, Junaid Shahid, David Rientjes, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon, May 22 2017 at 11:03am -0400, Michal Hocko <mhocko@kernel.org> wrote: > On Mon 22-05-17 10:52:44, Mikulas Patocka wrote: > > > > > > On Mon, 22 May 2017, Michal Hocko wrote: > [...] > > > I am not sure I understand. OOM killer is invoked for _all_ allocations > > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the > > > OOM killer is not disabled (oom_killer_disable) and that only happens > > > from the PM suspend path which makes sure that no userspace is active at > > > the time. AFAIU this is a userspace triggered path and so the later > > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient. > > > Relying to a portion of memory reserves to prevent from deadlock seems > > > fundamentaly broken to me. > > > > > > > The lvm2 was designed this way - it is broken, but there is not much that > > can be done about it - fixing this would mean major rewrite. The only > > thing we can do about it is to lower the deadlock probability with > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago). Yes, lvm2 was originally designed to to have access to memory reserves to ensure forward progress. But if the mm subsystem has improved to allow for the required progress without lvm2 trying to stake a claim on those reserves then we'll gladly avoid (ab)using them. > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail. OK, but will it be serviced immediately? Not failing isn't useful if it never completes. > If you need non-failing semantic then just make it clear by adding > __GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce > resource and there are users which might really need it from atomic > contexts. While adding the __GFP_NOFAIL flag would serve to document expectations I'm left unconvinced that the memory allocator will _not fail_ for an order-0 page -- as Mikulas said most ioctls don't need more than 4K. (Apologies if you've already covered _why_ we can have confidence in the mm subsystem's ability to ensure forward progress for these allocations). Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 18:04 ` Mike Snitzer @ 2017-05-22 20:35 ` David Rientjes 2017-05-22 23:35 ` Mike Snitzer 2017-05-23 6:05 ` Michal Hocko 2017-05-23 6:49 ` Michal Hocko 1 sibling, 2 replies; 16+ messages in thread From: David Rientjes @ 2017-05-22 20:35 UTC (permalink / raw) To: Mike Snitzer Cc: Michal Hocko, Mikulas Patocka, Junaid Shahid, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon, 22 May 2017, Mike Snitzer wrote: > > > The lvm2 was designed this way - it is broken, but there is not much that > > > can be done about it - fixing this would mean major rewrite. The only > > > thing we can do about it is to lower the deadlock probability with > > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago). > > Yes, lvm2 was originally designed to to have access to memory reserves > to ensure forward progress. But if the mm subsystem has improved to > allow for the required progress without lvm2 trying to stake a claim on > those reserves then we'll gladly avoid (ab)using them. > There is no such improvement to the page allocator when allocating at runtime. A persistent amount of memory in a mempool could be set aside as a preallocation and unavailable from the rest of the system forever as an alternative to dynamically allocating with memory reserves, but that has obvious downsides. This patch is the exact right thing to do. > > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail. > > OK, but will it be serviced immediately? Not failing isn't useful if it > never completes. > No, and you can use __GFP_HIGH, which this patch does, to have a reasonable expectation of forward progress in the very near term. > While adding the __GFP_NOFAIL flag would serve to document expectations > I'm left unconvinced that the memory allocator will _not fail_ for an > order-0 page -- as Mikulas said most ioctls don't need more than 4K. __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never fallback to vmalloc :) I'm hoping this can get merged during the 4.12 window to fix the broken commit d224e9381897. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 20:35 ` David Rientjes @ 2017-05-22 23:35 ` Mike Snitzer 2017-05-23 6:05 ` Michal Hocko 1 sibling, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2017-05-22 23:35 UTC (permalink / raw) To: David Rientjes Cc: Michal Hocko, Mikulas Patocka, Junaid Shahid, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon, May 22 2017 at 4:35pm -0400, David Rientjes <rientjes@google.com> wrote: > On Mon, 22 May 2017, Mike Snitzer wrote: > > > > > The lvm2 was designed this way - it is broken, but there is not much that > > > > can be done about it - fixing this would mean major rewrite. The only > > > > thing we can do about it is to lower the deadlock probability with > > > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago). > > > > Yes, lvm2 was originally designed to to have access to memory reserves > > to ensure forward progress. But if the mm subsystem has improved to > > allow for the required progress without lvm2 trying to stake a claim on > > those reserves then we'll gladly avoid (ab)using them. > > > > There is no such improvement to the page allocator when allocating at > runtime. A persistent amount of memory in a mempool could be set aside as > a preallocation and unavailable from the rest of the system forever as an > alternative to dynamically allocating with memory reserves, but that has > obvious downsides. This patch is the exact right thing to do. > > > > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail. > > > > OK, but will it be serviced immediately? Not failing isn't useful if it > > never completes. > > > > No, and you can use __GFP_HIGH, which this patch does, to have a > reasonable expectation of forward progress in the very near term. > > > While adding the __GFP_NOFAIL flag would serve to document expectations > > I'm left unconvinced that the memory allocator will _not fail_ for an > > order-0 page -- as Mikulas said most ioctls don't need more than 4K. > > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never > fallback to vmalloc :) > > I'm hoping this can get merged during the 4.12 window to fix the broken > commit d224e9381897. I've added your Acked-by and staged it for 4.12, please see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.12/dm&id=8c1e2162f27b319da913683143c0c6c09b083ebb Not sure when I'll send it to Linus but certainly no later than for rc4 inclusion. Thanks, Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 20:35 ` David Rientjes 2017-05-22 23:35 ` Mike Snitzer @ 2017-05-23 6:05 ` Michal Hocko 2017-05-23 16:44 ` Mikulas Patocka 1 sibling, 1 reply; 16+ messages in thread From: Michal Hocko @ 2017-05-23 6:05 UTC (permalink / raw) To: David Rientjes Cc: Mike Snitzer, Mikulas Patocka, Junaid Shahid, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon 22-05-17 13:35:41, David Rientjes wrote: > On Mon, 22 May 2017, Mike Snitzer wrote: [...] > > While adding the __GFP_NOFAIL flag would serve to document expectations > > I'm left unconvinced that the memory allocator will _not fail_ for an > > order-0 page -- as Mikulas said most ioctls don't need more than 4K. > > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never > fallback to vmalloc :) Sorry, I could have been more specific. You would have to opencode kvmalloc obviously. It is documented to not support this flag for the reasons you have mentioned above. > I'm hoping this can get merged during the 4.12 window to fix the broken > commit d224e9381897. I obviously disagree. Relying on memory reserves for _correctness_ is clearly broken by design, full stop. But it is dm code and you are going it is responsibility of the respective maintainers to support this code. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-23 6:05 ` Michal Hocko @ 2017-05-23 16:44 ` Mikulas Patocka 2017-05-25 8:58 ` Michal Hocko 0 siblings, 1 reply; 16+ messages in thread From: Mikulas Patocka @ 2017-05-23 16:44 UTC (permalink / raw) To: Michal Hocko Cc: David Rientjes, Mike Snitzer, Junaid Shahid, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Tue, 23 May 2017, Michal Hocko wrote: > On Mon 22-05-17 13:35:41, David Rientjes wrote: > > On Mon, 22 May 2017, Mike Snitzer wrote: > [...] > > > While adding the __GFP_NOFAIL flag would serve to document expectations > > > I'm left unconvinced that the memory allocator will _not fail_ for an > > > order-0 page -- as Mikulas said most ioctls don't need more than 4K. > > > > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never > > fallback to vmalloc :) > > Sorry, I could have been more specific. You would have to opencode > kvmalloc obviously. It is documented to not support this flag for the > reasons you have mentioned above. > > > I'm hoping this can get merged during the 4.12 window to fix the broken > > commit d224e9381897. > > I obviously disagree. Relying on memory reserves for _correctness_ is > clearly broken by design, full stop. But it is dm code and you are going > it is responsibility of the respective maintainers to support this code. Block loop device is broken in the same way - it converts block requests to filesystem reads and writes and those FS reads and writes allocate memory. Network block device needs an userspace daemon to perform I/O. iSCSI also needs to allocate memory to perform I/O. NFS and other networking filesystems are also broken in the same way (they need to receive a packet to acknowledge a write and packet reception needs to allocate memory). So - what should these *broken* drivers do to reduce the possibility of the deadlock? Mikulas > -- > Michal Hocko > SUSE Labs > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-23 16:44 ` Mikulas Patocka @ 2017-05-25 8:58 ` Michal Hocko 0 siblings, 0 replies; 16+ messages in thread From: Michal Hocko @ 2017-05-25 8:58 UTC (permalink / raw) To: Mikulas Patocka Cc: David Rientjes, Mike Snitzer, Junaid Shahid, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Tue 23-05-17 12:44:18, Mikulas Patocka wrote: > > > On Tue, 23 May 2017, Michal Hocko wrote: > > > On Mon 22-05-17 13:35:41, David Rientjes wrote: > > > On Mon, 22 May 2017, Mike Snitzer wrote: > > [...] > > > > While adding the __GFP_NOFAIL flag would serve to document expectations > > > > I'm left unconvinced that the memory allocator will _not fail_ for an > > > > order-0 page -- as Mikulas said most ioctls don't need more than 4K. > > > > > > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never > > > fallback to vmalloc :) > > > > Sorry, I could have been more specific. You would have to opencode > > kvmalloc obviously. It is documented to not support this flag for the > > reasons you have mentioned above. > > > > > I'm hoping this can get merged during the 4.12 window to fix the broken > > > commit d224e9381897. > > > > I obviously disagree. Relying on memory reserves for _correctness_ is > > clearly broken by design, full stop. But it is dm code and you are going > > it is responsibility of the respective maintainers to support this code. > > Block loop device is broken in the same way - it converts block requests > to filesystem reads and writes and those FS reads and writes allocate > memory. I do not see those would depend on the __GFP_HIGH. Also writes are throttled so the memory shouldn't get full of dirty pages. > Network block device needs an userspace daemon to perform I/O. which makes it pretty much not reliable for any forward progress. AFAIR swap over NBD access full memory reserves to overcome this. But that is merely an exception > iSCSI also needs to allocate memory to perform I/O. Shouldn't it use mempools? I am sorry but I am not familiar with this area at all. > NFS and other networking filesystems are also broken in the same way (they > need to receive a packet to acknowledge a write and packet reception needs > to allocate memory). > > So - what should these *broken* drivers do to reduce the possibility of > the deadlock? the IO path has traditionally used mempools to guarantee a forward progress. If this is not an option then the choice is not all that great. We are throttling memory writers (or drop packets when the memory is too low) and finally have the OOM killer to free up some memory. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm ioctl: Restore __GFP_HIGH in copy_params() 2017-05-22 18:04 ` Mike Snitzer 2017-05-22 20:35 ` David Rientjes @ 2017-05-23 6:49 ` Michal Hocko 1 sibling, 0 replies; 16+ messages in thread From: Michal Hocko @ 2017-05-23 6:49 UTC (permalink / raw) To: Mike Snitzer Cc: Mikulas Patocka, Junaid Shahid, David Rientjes, Alasdair Kergon, Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel On Mon 22-05-17 14:04:15, Mike Snitzer wrote: > On Mon, May 22 2017 at 11:03am -0400, > Michal Hocko <mhocko@kernel.org> wrote: > > > On Mon 22-05-17 10:52:44, Mikulas Patocka wrote: > > > > > > > > > On Mon, 22 May 2017, Michal Hocko wrote: > > [...] > > > > I am not sure I understand. OOM killer is invoked for _all_ allocations > > > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the > > > > OOM killer is not disabled (oom_killer_disable) and that only happens > > > > from the PM suspend path which makes sure that no userspace is active at > > > > the time. AFAIU this is a userspace triggered path and so the later > > > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient. > > > > Relying to a portion of memory reserves to prevent from deadlock seems > > > > fundamentaly broken to me. > > > > > > > > > > The lvm2 was designed this way - it is broken, but there is not much that > > > can be done about it - fixing this would mean major rewrite. The only > > > thing we can do about it is to lower the deadlock probability with > > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago). > > Yes, lvm2 was originally designed to to have access to memory reserves > to ensure forward progress. But if the mm subsystem has improved to > allow for the required progress without lvm2 trying to stake a claim on > those reserves then we'll gladly avoid (ab)using them. > > > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail. > > OK, but will it be serviced immediately? Not failing isn't useful if it > never completes. Well, GFP_KERNEL will not guarantee an immediate success of course. There is nothing like that. Nor __GFP_HIGH will guarantee that, though, because reserves can get easily depleted by some workloads. You would have to use a dedicated memory pool to accomplish what you really need. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <alpine.LRH.2.02.1705191949340.17646@file01.intranet.prod.int.rdu2.redhat.com>]
[parent not found: <20170520083412.GD11925@dhcp22.suse.cz>]
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() [not found] ` <20170520083412.GD11925@dhcp22.suse.cz> @ 2017-05-20 19:00 ` Mikulas Patocka 0 siblings, 0 replies; 16+ messages in thread From: Mikulas Patocka @ 2017-05-20 19:00 UTC (permalink / raw) To: Michal Hocko, Mike Snitzer Cc: Junaid Shahid, akpm, linux-kernel, andreslc, gthelen, rientjes, vbabka, dm-devel On Sat, 20 May 2017, Michal Hocko wrote: > On Fri 19-05-17 19:50:24, Mikulas Patocka wrote: > > > > > > On Thu, 18 May 2017, Junaid Shahid wrote: > > > > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded > > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to > > > kvmalloc. This can cause the IOCTL to fail in some low memory situations > > > where it wouldn't have failed earlier. This patch adds it back to avoid > > > any potential regression. > > > > > > Signed-off-by: Junaid Shahid <junaids@google.com> > > > > Acked-by: Mikulas Patocka <mpatocka@redhat.com> > > Can we have a comment explaning why we need memory reserves in this case > please? Here I'm sending the patch with the comment: d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant) left out the __GFP_HIGH flag when converting from __vmalloc to kvmalloc. This can cause the IOCTL to fail in some low memory situations where it wouldn't have failed earlier. This patch adds it back to avoid any potential regression. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-ioctl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/md/dm-ioctl.c =================================================================== --- linux-2.6.orig/drivers/md/dm-ioctl.c +++ linux-2.6/drivers/md/dm-ioctl.c @@ -1710,12 +1710,13 @@ static int copy_params(struct dm_ioctl _ } /* - * Try to avoid low memory issues when a device is suspended. + * Use __GFP_HIGH to avoid low memory issues when a device is suspended + * and the ioctl is needed to resume it. * Use kmalloc() rather than vmalloc() when we can. */ dmi = NULL; noio_flag = memalloc_noio_save(); - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL); + dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH); memalloc_noio_restore(noio_flag); if (!dmi) { ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-05-25 8:58 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20170518185040.108293-1-junaids@google.com> [not found] ` <20170518190406.GB2330@dhcp22.suse.cz> [not found] ` <alpine.DEB.2.10.1705181338090.132717@chino.kir.corp.google.com> 2017-05-19 2:50 ` [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() Junaid Shahid 2017-05-19 7:46 ` Michal Hocko 2017-05-19 23:43 ` Mikulas Patocka 2017-05-22 9:37 ` Michal Hocko 2017-05-22 12:00 ` Mikulas Patocka 2017-05-22 12:09 ` Michal Hocko 2017-05-22 14:52 ` Mikulas Patocka 2017-05-22 15:03 ` Michal Hocko 2017-05-22 18:04 ` Mike Snitzer 2017-05-22 20:35 ` David Rientjes 2017-05-22 23:35 ` Mike Snitzer 2017-05-23 6:05 ` Michal Hocko 2017-05-23 16:44 ` Mikulas Patocka 2017-05-25 8:58 ` Michal Hocko 2017-05-23 6:49 ` Michal Hocko [not found] ` <alpine.LRH.2.02.1705191949340.17646@file01.intranet.prod.int.rdu2.redhat.com> [not found] ` <20170520083412.GD11925@dhcp22.suse.cz> 2017-05-20 19:00 ` [PATCH] " Mikulas Patocka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).