linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
       [not found] <20170518185040.108293-1-junaids@google.com>
@ 2017-05-18 19:00 ` Junaid Shahid
       [not found] ` <20170518190406.GB2330@dhcp22.suse.cz>
  1 sibling, 0 replies; 16+ messages in thread
From: Junaid Shahid @ 2017-05-18 19:00 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, andreslc, gthelen, mpatocka, rientjes, mhocko, vbabka

(Correcting linux-mm email addr)

d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded
variant) left out the __GFP_HIGH flag when converting from __vmalloc to
kvmalloc. This can cause the IOCTL to fail in some low memory situations
where it wouldn't have failed earlier. This patch adds it back to avoid
any potential regression.

Signed-off-by: Junaid Shahid <junaids@google.com>
---
 drivers/md/dm-ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 0555b4410e05..bacad7637a56 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
 	 */
 	dmi = NULL;
 	noio_flag = memalloc_noio_save();
-	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL);
+	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
 	memalloc_noio_restore(noio_flag);
 
 	if (!dmi) {
-- 
2.13.0.303.g4ebf302169-goog

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
       [not found]   ` <alpine.DEB.2.10.1705181338090.132717@chino.kir.corp.google.com>
@ 2017-05-19  2:50     ` Junaid Shahid
  2017-05-19  7:46       ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Junaid Shahid @ 2017-05-19  2:50 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Alasdair Kergon, Mike Snitzer, Andrew Morton,
	linux-mm, andreslc, gthelen, mpatocka, vbabka, linux-kernel

(Adding back the correct linux-mm email address and also adding linux-kernel.)

On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> On Thu, 18 May 2017, Michal Hocko wrote:
> 
> > On Thu 18-05-17 11:50:40, Junaid Shahid wrote:
> > > d224e9381897 (drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded
> > > variant) left out the __GFP_HIGH flag when converting from __vmalloc to
> > > kvmalloc. This can cause the IOCTL to fail in some low memory situations
> > > where it wouldn't have failed earlier. This patch adds it back to avoid
> > > any potential regression.
> > 
> > The code previously used __GFP_HIGH only for the vmalloc fallback and
> > that doesn't make that much sense with the current implementation
> > because vmalloc does order-0 pages and those do not really fail and the
> > oom killer is invoked to free memory.
> > 
> 
> Order-0 pages certainly do fail, there is not an infinite amount of memory 
> nor is there a specific exemption to allow order-0 memory to be alloctable 
> below watermarks without this gfp flag.  OOM kill is the last thing we 
> want for these allocations since they are very temporary.
> 
> > There is no reason to access memory reserves from this context.
> > 
> 
> Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> assuming there was a reason to do it in the first place in two different 
> ways.
> 
> This decision is up to the device mapper maintainers.
> 
> > > Signed-off-by: Junaid Shahid <junaids@google.com>
> > > ---
> > >  drivers/md/dm-ioctl.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> > > index 0555b4410e05..bacad7637a56 100644
> > > --- a/drivers/md/dm-ioctl.c
> > > +++ b/drivers/md/dm-ioctl.c
> > > @@ -1715,7 +1715,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
> > >  	 */
> > >  	dmi = NULL;
> > >  	noio_flag = memalloc_noio_save();
> > > -	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL);
> > > +	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> > >  	memalloc_noio_restore(noio_flag);
> > >  
> > >  	if (!dmi) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-19  2:50     ` Junaid Shahid
@ 2017-05-19  7:46       ` Michal Hocko
  2017-05-19 23:43         ` Mikulas Patocka
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-05-19  7:46 UTC (permalink / raw)
  To: Junaid Shahid
  Cc: David Rientjes, Alasdair Kergon, Mike Snitzer, Andrew Morton,
	linux-mm, andreslc, gthelen, mpatocka, vbabka, linux-kernel

On Thu 18-05-17 19:50:46, Junaid Shahid wrote:
> (Adding back the correct linux-mm email address and also adding linux-kernel.)
> 
> On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
[...]
> > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> > assuming there was a reason to do it in the first place in two different 
> > ways.

Hmm, the old PF_MEMALLOC used to have the following comment
        /*
         * Trying to avoid low memory issues when a device is
         * suspended. 
         */

I am not really sure what that means but __GFP_HIGH certainly have a
different semantic than PF_MEMALLOC. The later grants the full access to
the memory reserves while the prior on partial access. If this is _really_
needed then it deserves a comment explaining why.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-19  7:46       ` Michal Hocko
@ 2017-05-19 23:43         ` Mikulas Patocka
  2017-05-22  9:37           ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Mikulas Patocka @ 2017-05-19 23:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel



On Fri, 19 May 2017, Michal Hocko wrote:

> On Thu 18-05-17 19:50:46, Junaid Shahid wrote:
> > (Adding back the correct linux-mm email address and also adding linux-kernel.)
> > 
> > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> [...]
> > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> > > assuming there was a reason to do it in the first place in two different 
> > > ways.
> 
> Hmm, the old PF_MEMALLOC used to have the following comment
>         /*
>          * Trying to avoid low memory issues when a device is
>          * suspended. 
>          */
> 
> I am not really sure what that means but __GFP_HIGH certainly have a
> different semantic than PF_MEMALLOC. The later grants the full access to
> the memory reserves while the prior on partial access. If this is _really_
> needed then it deserves a comment explaining why.
> -- 
> Michal Hocko
> SUSE Labs

Sometimes, I/O to a device mapper device is blocked until the userspace 
daemon dmeventd does some action (for example, when dm-mirror leg fails, 
dmeventd needs to mark the leg as failed in the lvm metadata and then 
reload the device).

The dmeventd daemon mlocks itself in memory so that it doesn't generate 
any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that 
the ioctls issued by dmeventd have higher chance of succeeding if some I/O 
is blocked, waiting for dmeventd action. It reduces the possibility of 
low-memory-deadlock, though it doesn't eliminate it entirely.

Mikulas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-19 23:43         ` Mikulas Patocka
@ 2017-05-22  9:37           ` Michal Hocko
  2017-05-22 12:00             ` Mikulas Patocka
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-05-22  9:37 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Fri 19-05-17 19:43:23, Mikulas Patocka wrote:
> 
> 
> On Fri, 19 May 2017, Michal Hocko wrote:
> 
> > On Thu 18-05-17 19:50:46, Junaid Shahid wrote:
> > > (Adding back the correct linux-mm email address and also adding linux-kernel.)
> > > 
> > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> > [...]
> > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> > > > assuming there was a reason to do it in the first place in two different 
> > > > ways.
> > 
> > Hmm, the old PF_MEMALLOC used to have the following comment
> >         /*
> >          * Trying to avoid low memory issues when a device is
> >          * suspended. 
> >          */
> > 
> > I am not really sure what that means but __GFP_HIGH certainly have a
> > different semantic than PF_MEMALLOC. The later grants the full access to
> > the memory reserves while the prior on partial access. If this is _really_
> > needed then it deserves a comment explaining why.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> Sometimes, I/O to a device mapper device is blocked until the userspace 
> daemon dmeventd does some action (for example, when dm-mirror leg fails, 
> dmeventd needs to mark the leg as failed in the lvm metadata and then 
> reload the device).
> 
> The dmeventd daemon mlocks itself in memory so that it doesn't generate 
> any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that 
> the ioctls issued by dmeventd have higher chance of succeeding if some I/O 
> is blocked, waiting for dmeventd action. It reduces the possibility of 
> low-memory-deadlock, though it doesn't eliminate it entirely.

So what happens if the memory reserves are depleted. Do we deadlock? Why
is OOM killer insufficient to allow the further progress?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22  9:37           ` Michal Hocko
@ 2017-05-22 12:00             ` Mikulas Patocka
  2017-05-22 12:09               ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Mikulas Patocka @ 2017-05-22 12:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel



On Mon, 22 May 2017, Michal Hocko wrote:

> On Fri 19-05-17 19:43:23, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 19 May 2017, Michal Hocko wrote:
> > 
> > > On Thu 18-05-17 19:50:46, Junaid Shahid wrote:
> > > > (Adding back the correct linux-mm email address and also adding linux-kernel.)
> > > > 
> > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> > > [...]
> > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> > > > > assuming there was a reason to do it in the first place in two different 
> > > > > ways.
> > > 
> > > Hmm, the old PF_MEMALLOC used to have the following comment
> > >         /*
> > >          * Trying to avoid low memory issues when a device is
> > >          * suspended. 
> > >          */
> > > 
> > > I am not really sure what that means but __GFP_HIGH certainly have a
> > > different semantic than PF_MEMALLOC. The later grants the full access to
> > > the memory reserves while the prior on partial access. If this is _really_
> > > needed then it deserves a comment explaining why.
> > > -- 
> > > Michal Hocko
> > > SUSE Labs
> > 
> > Sometimes, I/O to a device mapper device is blocked until the userspace 
> > daemon dmeventd does some action (for example, when dm-mirror leg fails, 
> > dmeventd needs to mark the leg as failed in the lvm metadata and then 
> > reload the device).
> > 
> > The dmeventd daemon mlocks itself in memory so that it doesn't generate 
> > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that 
> > the ioctls issued by dmeventd have higher chance of succeeding if some I/O 
> > is blocked, waiting for dmeventd action. It reduces the possibility of 
> > low-memory-deadlock, though it doesn't eliminate it entirely.
> 
> So what happens if the memory reserves are depleted. Do we deadlock?

Yes, it will deadlock.

> Why is OOM killer insufficient to allow the further progress?

I don't know if the OOM killer will or won't be triggered in this 
situation, it depends on the people who wrote the OOM killer.

> -- 
> Michal Hocko
> SUSE Labs

Mikulas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 12:00             ` Mikulas Patocka
@ 2017-05-22 12:09               ` Michal Hocko
  2017-05-22 14:52                 ` Mikulas Patocka
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-05-22 12:09 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon 22-05-17 08:00:11, Mikulas Patocka wrote:
> 
> 
> On Mon, 22 May 2017, Michal Hocko wrote:
> 
> > On Fri 19-05-17 19:43:23, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Fri, 19 May 2017, Michal Hocko wrote:
> > > 
> > > > On Thu 18-05-17 19:50:46, Junaid Shahid wrote:
> > > > > (Adding back the correct linux-mm email address and also adding linux-kernel.)
> > > > > 
> > > > > On Thursday, May 18, 2017 01:41:33 PM David Rientjes wrote:
> > > > [...]
> > > > > > Let's ask Mikulas, who changed this from PF_MEMALLOC to __GFP_HIGH, 
> > > > > > assuming there was a reason to do it in the first place in two different 
> > > > > > ways.
> > > > 
> > > > Hmm, the old PF_MEMALLOC used to have the following comment
> > > >         /*
> > > >          * Trying to avoid low memory issues when a device is
> > > >          * suspended. 
> > > >          */
> > > > 
> > > > I am not really sure what that means but __GFP_HIGH certainly have a
> > > > different semantic than PF_MEMALLOC. The later grants the full access to
> > > > the memory reserves while the prior on partial access. If this is _really_
> > > > needed then it deserves a comment explaining why.
> > > > -- 
> > > > Michal Hocko
> > > > SUSE Labs
> > > 
> > > Sometimes, I/O to a device mapper device is blocked until the userspace 
> > > daemon dmeventd does some action (for example, when dm-mirror leg fails, 
> > > dmeventd needs to mark the leg as failed in the lvm metadata and then 
> > > reload the device).
> > > 
> > > The dmeventd daemon mlocks itself in memory so that it doesn't generate 
> > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that 
> > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O 
> > > is blocked, waiting for dmeventd action. It reduces the possibility of 
> > > low-memory-deadlock, though it doesn't eliminate it entirely.
> > 
> > So what happens if the memory reserves are depleted. Do we deadlock?
> 
> Yes, it will deadlock.

That would be more than unfortunate and begs for a different solution.
The thing is that __GFP_HIGH is not propagated to all allocations in the
vmalloc proper. E.g. page table allocations are hardcoded GFP_KERNEL.

> > Why is OOM killer insufficient to allow the further progress?
> 
> I don't know if the OOM killer will or won't be triggered in this 
> situation, it depends on the people who wrote the OOM killer.

I am not sure I understand. OOM killer is invoked for _all_ allocations
<= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
OOM killer is not disabled (oom_killer_disable) and that only happens
from the PM suspend path which makes sure that no userspace is active at
the time. AFAIU this is a userspace triggered path and so the later
shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
Relying to a portion of memory reserves to prevent from deadlock seems
fundamentaly broken  to me.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 12:09               ` Michal Hocko
@ 2017-05-22 14:52                 ` Mikulas Patocka
  2017-05-22 15:03                   ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Mikulas Patocka @ 2017-05-22 14:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel



On Mon, 22 May 2017, Michal Hocko wrote:

> On Mon 22-05-17 08:00:11, Mikulas Patocka wrote:
> > 
> > On Mon, 22 May 2017, Michal Hocko wrote:
> > 
> > > > Sometimes, I/O to a device mapper device is blocked until the userspace 
> > > > daemon dmeventd does some action (for example, when dm-mirror leg fails, 
> > > > dmeventd needs to mark the leg as failed in the lvm metadata and then 
> > > > reload the device).
> > > > 
> > > > The dmeventd daemon mlocks itself in memory so that it doesn't generate 
> > > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that 
> > > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O 
> > > > is blocked, waiting for dmeventd action. It reduces the possibility of 
> > > > low-memory-deadlock, though it doesn't eliminate it entirely.
> > > 
> > > So what happens if the memory reserves are depleted. Do we deadlock?
> > 
> > Yes, it will deadlock.
> 
> That would be more than unfortunate and begs for a different solution.
> The thing is that __GFP_HIGH is not propagated to all allocations in the
> vmalloc proper. E.g. page table allocations are hardcoded GFP_KERNEL.

For a typical device mapper use, the ioctl area is smaller than 4k, so the 
vmalloc won't happen.

> > > Why is OOM killer insufficient to allow the further progress?
> > 
> > I don't know if the OOM killer will or won't be triggered in this 
> > situation, it depends on the people who wrote the OOM killer.
> 
> I am not sure I understand. OOM killer is invoked for _all_ allocations
> <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> OOM killer is not disabled (oom_killer_disable) and that only happens
> from the PM suspend path which makes sure that no userspace is active at
> the time. AFAIU this is a userspace triggered path and so the later
> shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> Relying to a portion of memory reserves to prevent from deadlock seems
> fundamentaly broken  to me.
> 
> -- 
> Michal Hocko
> SUSE Labs

The lvm2 was designed this way - it is broken, but there is not much that 
can be done about it - fixing this would mean major rewrite. The only 
thing we can do about it is to lower the deadlock probability with 
__GFP_HIGH (or PF_MEMALLOC that was used some times ago).

Mikulas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 14:52                 ` Mikulas Patocka
@ 2017-05-22 15:03                   ` Michal Hocko
  2017-05-22 18:04                     ` Mike Snitzer
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-05-22 15:03 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Junaid Shahid, David Rientjes, Alasdair Kergon, Mike Snitzer,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon 22-05-17 10:52:44, Mikulas Patocka wrote:
> 
> 
> On Mon, 22 May 2017, Michal Hocko wrote:
[...] 
> > I am not sure I understand. OOM killer is invoked for _all_ allocations
> > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> > OOM killer is not disabled (oom_killer_disable) and that only happens
> > from the PM suspend path which makes sure that no userspace is active at
> > the time. AFAIU this is a userspace triggered path and so the later
> > shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> > Relying to a portion of memory reserves to prevent from deadlock seems
> > fundamentaly broken  to me.
> > 
> 
> The lvm2 was designed this way - it is broken, but there is not much that 
> can be done about it - fixing this would mean major rewrite. The only 
> thing we can do about it is to lower the deadlock probability with 
> __GFP_HIGH (or PF_MEMALLOC that was used some times ago).

But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.
If you need non-failing semantic then just make it clear by adding
__GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce
resource and there are users which might really need it from atomic
contexts.

Anyway, this is not the code I am maintaining so I will not argue more
and won't nack the patch. But is smells like a pure cargo cult, to be
honest.

If you really insist, though, I would just ask to have a more detailed
explanation why it is _believed_ the flag is needed because the vague
"Use __GFP_HIGH to avoid low memory issues when a device is suspended
and the ioctl is needed to resume it." doesn't really clarify much to be
honest.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 15:03                   ` Michal Hocko
@ 2017-05-22 18:04                     ` Mike Snitzer
  2017-05-22 20:35                       ` David Rientjes
  2017-05-23  6:49                       ` Michal Hocko
  0 siblings, 2 replies; 16+ messages in thread
From: Mike Snitzer @ 2017-05-22 18:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mikulas Patocka, Junaid Shahid, David Rientjes, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon, May 22 2017 at 11:03am -0400,
Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 22-05-17 10:52:44, Mikulas Patocka wrote:
> > 
> > 
> > On Mon, 22 May 2017, Michal Hocko wrote:
> [...] 
> > > I am not sure I understand. OOM killer is invoked for _all_ allocations
> > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> > > OOM killer is not disabled (oom_killer_disable) and that only happens
> > > from the PM suspend path which makes sure that no userspace is active at
> > > the time. AFAIU this is a userspace triggered path and so the later
> > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> > > Relying to a portion of memory reserves to prevent from deadlock seems
> > > fundamentaly broken  to me.
> > > 
> > 
> > The lvm2 was designed this way - it is broken, but there is not much that 
> > can be done about it - fixing this would mean major rewrite. The only 
> > thing we can do about it is to lower the deadlock probability with 
> > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).

Yes, lvm2 was originally designed to to have access to memory reserves
to ensure forward progress.  But if the mm subsystem has improved to
allow for the required progress without lvm2 trying to stake a claim on
those reserves then we'll gladly avoid (ab)using them.

> But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.

OK, but will it be serviced immediately?  Not failing isn't useful if it
never completes.

> If you need non-failing semantic then just make it clear by adding
> __GFP_NOFAIL rather than __GFP_HIGH. Memory reserves are a scarce
> resource and there are users which might really need it from atomic
> contexts.

While adding the __GFP_NOFAIL flag would serve to document expectations
I'm left unconvinced that the memory allocator will _not fail_ for an
order-0 page -- as Mikulas said most ioctls don't need more than 4K.
(Apologies if you've already covered _why_ we can have confidence in the
mm subsystem's ability to ensure forward progress for these allocations).

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 18:04                     ` Mike Snitzer
@ 2017-05-22 20:35                       ` David Rientjes
  2017-05-22 23:35                         ` Mike Snitzer
  2017-05-23  6:05                         ` Michal Hocko
  2017-05-23  6:49                       ` Michal Hocko
  1 sibling, 2 replies; 16+ messages in thread
From: David Rientjes @ 2017-05-22 20:35 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Michal Hocko, Mikulas Patocka, Junaid Shahid, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon, 22 May 2017, Mike Snitzer wrote:

> > > The lvm2 was designed this way - it is broken, but there is not much that 
> > > can be done about it - fixing this would mean major rewrite. The only 
> > > thing we can do about it is to lower the deadlock probability with 
> > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).
> 
> Yes, lvm2 was originally designed to to have access to memory reserves
> to ensure forward progress.  But if the mm subsystem has improved to
> allow for the required progress without lvm2 trying to stake a claim on
> those reserves then we'll gladly avoid (ab)using them.
> 

There is no such improvement to the page allocator when allocating at 
runtime.  A persistent amount of memory in a mempool could be set aside as 
a preallocation and unavailable from the rest of the system forever as an 
alternative to dynamically allocating with memory reserves, but that has 
obvious downsides.  This patch is the exact right thing to do.

> > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.
> 
> OK, but will it be serviced immediately?  Not failing isn't useful if it
> never completes.
> 

No, and you can use __GFP_HIGH, which this patch does, to have a 
reasonable expectation of forward progress in the very near term.

> While adding the __GFP_NOFAIL flag would serve to document expectations
> I'm left unconvinced that the memory allocator will _not fail_ for an
> order-0 page -- as Mikulas said most ioctls don't need more than 4K.

__GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never 
fallback to vmalloc :)

I'm hoping this can get merged during the 4.12 window to fix the broken 
commit d224e9381897.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 20:35                       ` David Rientjes
@ 2017-05-22 23:35                         ` Mike Snitzer
  2017-05-23  6:05                         ` Michal Hocko
  1 sibling, 0 replies; 16+ messages in thread
From: Mike Snitzer @ 2017-05-22 23:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Mikulas Patocka, Junaid Shahid, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon, May 22 2017 at  4:35pm -0400,
David Rientjes <rientjes@google.com> wrote:

> On Mon, 22 May 2017, Mike Snitzer wrote:
> 
> > > > The lvm2 was designed this way - it is broken, but there is not much that 
> > > > can be done about it - fixing this would mean major rewrite. The only 
> > > > thing we can do about it is to lower the deadlock probability with 
> > > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).
> > 
> > Yes, lvm2 was originally designed to to have access to memory reserves
> > to ensure forward progress.  But if the mm subsystem has improved to
> > allow for the required progress without lvm2 trying to stake a claim on
> > those reserves then we'll gladly avoid (ab)using them.
> > 
> 
> There is no such improvement to the page allocator when allocating at 
> runtime.  A persistent amount of memory in a mempool could be set aside as 
> a preallocation and unavailable from the rest of the system forever as an 
> alternative to dynamically allocating with memory reserves, but that has 
> obvious downsides.  This patch is the exact right thing to do.
> 
> > > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.
> > 
> > OK, but will it be serviced immediately?  Not failing isn't useful if it
> > never completes.
> > 
> 
> No, and you can use __GFP_HIGH, which this patch does, to have a 
> reasonable expectation of forward progress in the very near term.
> 
> > While adding the __GFP_NOFAIL flag would serve to document expectations
> > I'm left unconvinced that the memory allocator will _not fail_ for an
> > order-0 page -- as Mikulas said most ioctls don't need more than 4K.
> 
> __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never 
> fallback to vmalloc :)
> 
> I'm hoping this can get merged during the 4.12 window to fix the broken 
> commit d224e9381897.

I've added your Acked-by and staged it for 4.12, please see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.12/dm&id=8c1e2162f27b319da913683143c0c6c09b083ebb

Not sure when I'll send it to Linus but certainly no later than for rc4
inclusion.

Thanks,
Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 20:35                       ` David Rientjes
  2017-05-22 23:35                         ` Mike Snitzer
@ 2017-05-23  6:05                         ` Michal Hocko
  2017-05-23 16:44                           ` Mikulas Patocka
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-05-23  6:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mike Snitzer, Mikulas Patocka, Junaid Shahid, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon 22-05-17 13:35:41, David Rientjes wrote:
> On Mon, 22 May 2017, Mike Snitzer wrote:
[...]
> > While adding the __GFP_NOFAIL flag would serve to document expectations
> > I'm left unconvinced that the memory allocator will _not fail_ for an
> > order-0 page -- as Mikulas said most ioctls don't need more than 4K.
> 
> __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never 
> fallback to vmalloc :)

Sorry, I could have been more specific. You would have to opencode
kvmalloc obviously. It is documented to not support this flag for the
reasons you have mentioned above.

> I'm hoping this can get merged during the 4.12 window to fix the broken 
> commit d224e9381897.

I obviously disagree. Relying on memory reserves for _correctness_ is
clearly broken by design, full stop. But it is dm code and you are going
it is responsibility of the respective maintainers to support this code.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-22 18:04                     ` Mike Snitzer
  2017-05-22 20:35                       ` David Rientjes
@ 2017-05-23  6:49                       ` Michal Hocko
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-05-23  6:49 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Mikulas Patocka, Junaid Shahid, David Rientjes, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Mon 22-05-17 14:04:15, Mike Snitzer wrote:
> On Mon, May 22 2017 at 11:03am -0400,
> Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Mon 22-05-17 10:52:44, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Mon, 22 May 2017, Michal Hocko wrote:
> > [...] 
> > > > I am not sure I understand. OOM killer is invoked for _all_ allocations
> > > > <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> > > > OOM killer is not disabled (oom_killer_disable) and that only happens
> > > > from the PM suspend path which makes sure that no userspace is active at
> > > > the time. AFAIU this is a userspace triggered path and so the later
> > > > shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> > > > Relying to a portion of memory reserves to prevent from deadlock seems
> > > > fundamentaly broken  to me.
> > > > 
> > > 
> > > The lvm2 was designed this way - it is broken, but there is not much that 
> > > can be done about it - fixing this would mean major rewrite. The only 
> > > thing we can do about it is to lower the deadlock probability with 
> > > __GFP_HIGH (or PF_MEMALLOC that was used some times ago).
> 
> Yes, lvm2 was originally designed to to have access to memory reserves
> to ensure forward progress.  But if the mm subsystem has improved to
> allow for the required progress without lvm2 trying to stake a claim on
> those reserves then we'll gladly avoid (ab)using them.
> 
> > But let me repeat. GFP_KERNEL allocation for order-0 page will not fail.
> 
> OK, but will it be serviced immediately?  Not failing isn't useful if it
> never completes.

Well, GFP_KERNEL will not guarantee an immediate success of course.
There is nothing like that. Nor __GFP_HIGH will guarantee that, though,
because reserves can get easily depleted by some workloads. You would
have to use a dedicated memory pool to accomplish what you really need.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-23  6:05                         ` Michal Hocko
@ 2017-05-23 16:44                           ` Mikulas Patocka
  2017-05-25  8:58                             ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Mikulas Patocka @ 2017-05-23 16:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Rientjes, Mike Snitzer, Junaid Shahid, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel



On Tue, 23 May 2017, Michal Hocko wrote:

> On Mon 22-05-17 13:35:41, David Rientjes wrote:
> > On Mon, 22 May 2017, Mike Snitzer wrote:
> [...]
> > > While adding the __GFP_NOFAIL flag would serve to document expectations
> > > I'm left unconvinced that the memory allocator will _not fail_ for an
> > > order-0 page -- as Mikulas said most ioctls don't need more than 4K.
> > 
> > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never 
> > fallback to vmalloc :)
> 
> Sorry, I could have been more specific. You would have to opencode
> kvmalloc obviously. It is documented to not support this flag for the
> reasons you have mentioned above.
> 
> > I'm hoping this can get merged during the 4.12 window to fix the broken 
> > commit d224e9381897.
> 
> I obviously disagree. Relying on memory reserves for _correctness_ is
> clearly broken by design, full stop. But it is dm code and you are going
> it is responsibility of the respective maintainers to support this code.

Block loop device is broken in the same way - it converts block requests 
to filesystem reads and writes and those FS reads and writes allocate 
memory.

Network block device needs an userspace daemon to perform I/O.

iSCSI also needs to allocate memory to perform I/O.

NFS and other networking filesystems are also broken in the same way (they 
need to receive a packet to acknowledge a write and packet reception needs 
to allocate memory).

So - what should these *broken* drivers do to reduce the possibility of 
the deadlock?

Mikulas

> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: dm ioctl: Restore __GFP_HIGH in copy_params()
  2017-05-23 16:44                           ` Mikulas Patocka
@ 2017-05-25  8:58                             ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-05-25  8:58 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: David Rientjes, Mike Snitzer, Junaid Shahid, Alasdair Kergon,
	Andrew Morton, linux-mm, andreslc, gthelen, vbabka, linux-kernel

On Tue 23-05-17 12:44:18, Mikulas Patocka wrote:
> 
> 
> On Tue, 23 May 2017, Michal Hocko wrote:
> 
> > On Mon 22-05-17 13:35:41, David Rientjes wrote:
> > > On Mon, 22 May 2017, Mike Snitzer wrote:
> > [...]
> > > > While adding the __GFP_NOFAIL flag would serve to document expectations
> > > > I'm left unconvinced that the memory allocator will _not fail_ for an
> > > > order-0 page -- as Mikulas said most ioctls don't need more than 4K.
> > > 
> > > __GFP_NOFAIL would make no sense in kvmalloc() calls, ever, it would never 
> > > fallback to vmalloc :)
> > 
> > Sorry, I could have been more specific. You would have to opencode
> > kvmalloc obviously. It is documented to not support this flag for the
> > reasons you have mentioned above.
> > 
> > > I'm hoping this can get merged during the 4.12 window to fix the broken 
> > > commit d224e9381897.
> > 
> > I obviously disagree. Relying on memory reserves for _correctness_ is
> > clearly broken by design, full stop. But it is dm code and you are going
> > it is responsibility of the respective maintainers to support this code.
> 
> Block loop device is broken in the same way - it converts block requests 
> to filesystem reads and writes and those FS reads and writes allocate 
> memory.

I do not see those would depend on the __GFP_HIGH. Also writes are throttled
so the memory shouldn't get full of dirty pages.

> Network block device needs an userspace daemon to perform I/O.

which makes it pretty much not reliable for any forward progress. AFAIR
swap over NBD access full memory reserves to overcome this. But that is
merely an exception

> iSCSI also needs to allocate memory to perform I/O.

Shouldn't it use mempools? I am sorry but I am not familiar with this
area at all.
 
> NFS and other networking filesystems are also broken in the same way (they 
> need to receive a packet to acknowledge a write and packet reception needs 
> to allocate memory).
> 
> So - what should these *broken* drivers do to reduce the possibility of 
> the deadlock?

the IO path has traditionally used mempools to guarantee a forward
progress. If this is not an option then the choice is not all that
great. We are throttling memory writers (or drop packets when the memory
is too low) and finally have the OOM killer to free up some memory. 
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-05-25  8:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170518185040.108293-1-junaids@google.com>
2017-05-18 19:00 ` [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params() Junaid Shahid
     [not found] ` <20170518190406.GB2330@dhcp22.suse.cz>
     [not found]   ` <alpine.DEB.2.10.1705181338090.132717@chino.kir.corp.google.com>
2017-05-19  2:50     ` Junaid Shahid
2017-05-19  7:46       ` Michal Hocko
2017-05-19 23:43         ` Mikulas Patocka
2017-05-22  9:37           ` Michal Hocko
2017-05-22 12:00             ` Mikulas Patocka
2017-05-22 12:09               ` Michal Hocko
2017-05-22 14:52                 ` Mikulas Patocka
2017-05-22 15:03                   ` Michal Hocko
2017-05-22 18:04                     ` Mike Snitzer
2017-05-22 20:35                       ` David Rientjes
2017-05-22 23:35                         ` Mike Snitzer
2017-05-23  6:05                         ` Michal Hocko
2017-05-23 16:44                           ` Mikulas Patocka
2017-05-25  8:58                             ` Michal Hocko
2017-05-23  6:49                       ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).