linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage
@ 2015-03-02 13:54 Michal Hocko
  2015-03-02 13:54 ` [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status Michal Hocko
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 13:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, David Rientjes, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

Hi,
from the last discussion about __GFP_NOFAIL it turned out that the flag
cannot be deprecated as easily as MM people hoped for.
The current flag description leads people to inventing their own loops
around GFP_{KERNEL|NOFS} allocations without any fallback failure
policy. This makes the situation much worse for the MM layer because it
is not aware of the hard no-fail requirements.

First patch in this series tries to rephrase the documentation to be
more clear about our intention. __GFP_NOFAIL is generally discouraged
but users shouldn't lie to the allocator if they really cannot have any
failure strategy.

Second patch removes such an open coded retry loop in the jbd2 code
which was introduced exactly because of the deprecation wording. I am
not sure the patch is still required because Ted has mentioned something
about changing this code. If he was faster then we can simply drop
this one. I was hoping for more such opencoded paths but wasn't very
successful. The next plan is to abuse coccinelle to find such patterns.

The last two patches are attempts to get rid of __GFP_NOFAIL when
it seemed they are not needed because there are error paths handled
properly. I am not familiar with that code so I might be clearly
wrong here and I would appreciate double checking from the respective
maintainer.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status
  2015-03-02 13:54 [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage Michal Hocko
@ 2015-03-02 13:54 ` Michal Hocko
  2015-03-02 20:34   ` David Rientjes
  2015-03-02 13:54 ` [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL Michal Hocko
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 13:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, David Rientjes, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

__GFP_NOFAIL is documented as a deprecated flag since 478352e789f5 (mm:
add comment about deprecation of __GFP_NOFAIL). This has discouraged
people from using it but in some cases an opencoded endless loop around
allocator has been used instead. So the allocator is not aware of the
de facto __GFP_NOFAIL allocation because this information was not
communicated properly.

Let's make clear that if the allocation context really cannot effort
failure because there is no good failure policy then using __GFP_NOFAIL
is preferable to opencoding the loop outside of the allocator.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/gfp.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 51bd1e72a917..0cf9c2772988 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -57,8 +57,10 @@ struct vm_area_struct;
  * _might_ fail.  This depends upon the particular VM implementation.
  *
  * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
- * cannot handle allocation failures.  This modifier is deprecated and no new
- * users should be added.
+ * cannot handle allocation failures. New users should be evaluated carefuly
+ * (and the flag should be used only when there is no reasonable failure policy)
+ * but it is definitely preferable to use the flag rather than opencode endless
+ * loop around allocator.
  *
  * __GFP_NORETRY: The VM implementation must not retry indefinitely.
  *
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL
  2015-03-02 13:54 [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage Michal Hocko
  2015-03-02 13:54 ` [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status Michal Hocko
@ 2015-03-02 13:54 ` Michal Hocko
  2015-03-02 20:33   ` David Rientjes
  2015-03-02 13:54 ` [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement Michal Hocko
  2015-03-02 13:54 ` [RFC 4/4] cxgb4: drop " Michal Hocko
  3 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 13:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, David Rientjes, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

This basically reverts 47def82672b3 (jbd2: Remove __GFP_NOFAIL from jbd2
layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
to open coding the endless loop around the allocator rather than
removing the dependency on the non failing allocation. So the
deprecation was a clear failure and the reality tells us that
__GFP_NOFAIL is not even close to go away.

It is still true that __GFP_NOFAIL allocations are generally discouraged
and new uses should be evaluated and an alternative (pre-allocations or
reservations) should be considered but it doesn't make any sense to lie
the allocator about the requirements. Allocator can take steps to help
making a progress if it knows the requirements.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 fs/jbd2/journal.c     | 11 +----------
 fs/jbd2/transaction.c | 20 +++++++-------------
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b96bd8076b70..0bc333b4a594 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -371,16 +371,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
 	 */
 	J_ASSERT_BH(bh_in, buffer_jbddirty(bh_in));
 
-retry_alloc:
-	new_bh = alloc_buffer_head(GFP_NOFS);
-	if (!new_bh) {
-		/*
-		 * Failure is not an option, but __GFP_NOFAIL is going
-		 * away; so we retry ourselves here.
-		 */
-		congestion_wait(BLK_RW_ASYNC, HZ/50);
-		goto retry_alloc;
-	}
+	new_bh = alloc_buffer_head(GFP_NOFS|__GFP_NOFAIL);
 
 	/* keep subsequent assertions sane */
 	atomic_set(&new_bh->b_count, 1);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 5f09370c90a8..dac4523fa142 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -278,22 +278,16 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 
 alloc_transaction:
 	if (!journal->j_running_transaction) {
+		/*
+		 * If __GFP_FS is not present, then we may be being called from
+		 * inside the fs writeback layer, so we MUST NOT fail.
+		 */
+		if ((gfp_mask & __GFP_FS) == 0)
+			gfp_mask |= __GFP_NOFAIL;
 		new_transaction = kmem_cache_zalloc(transaction_cache,
 						    gfp_mask);
-		if (!new_transaction) {
-			/*
-			 * If __GFP_FS is not present, then we may be
-			 * being called from inside the fs writeback
-			 * layer, so we MUST NOT fail.  Since
-			 * __GFP_NOFAIL is going away, we will arrange
-			 * to retry the allocation ourselves.
-			 */
-			if ((gfp_mask & __GFP_FS) == 0) {
-				congestion_wait(BLK_RW_ASYNC, HZ/50);
-				goto alloc_transaction;
-			}
+		if (!new_transaction)
 			return -ENOMEM;
-		}
 	}
 
 	jbd_debug(3, "New handle %p going live.\n", handle);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement
  2015-03-02 13:54 [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage Michal Hocko
  2015-03-02 13:54 ` [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status Michal Hocko
  2015-03-02 13:54 ` [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL Michal Hocko
@ 2015-03-02 13:54 ` Michal Hocko
  2015-03-02 20:04   ` David Miller
  2015-03-02 13:54 ` [RFC 4/4] cxgb4: drop " Michal Hocko
  3 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 13:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, David Rientjes, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

mdesc_kmalloc is currently requiring __GFP_NOFAIL allocation although it
seems that the allocation failure is handled by all callers (via
mdesc_alloc). __GFP_NOFAIL is a strong liability for the memory
allocator and so the users are discouraged to use the flag unless the
allocation failure is really a nogo. Drop the flag here as this doesn't
seem to be the case.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 arch/sparc/kernel/mdesc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 99632a87e697..6801bd778af2 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -136,7 +136,7 @@ static struct mdesc_handle *mdesc_kmalloc(unsigned int mdesc_size)
 		       sizeof(struct mdesc_hdr) +
 		       mdesc_size);
 
-	base = kmalloc(handle_size + 15, GFP_KERNEL | __GFP_NOFAIL);
+	base = kmalloc(handle_size + 15, GFP_KERNEL);
 	if (base) {
 		struct mdesc_handle *hp;
 		unsigned long addr;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 4/4] cxgb4: drop __GFP_NOFAIL allocation
  2015-03-02 13:54 [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage Michal Hocko
                   ` (2 preceding siblings ...)
  2015-03-02 13:54 ` [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement Michal Hocko
@ 2015-03-02 13:54 ` Michal Hocko
  2015-03-03 12:22   ` Tetsuo Handa
  3 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 13:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Johannes Weiner, David Rientjes, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

set_filter_wr is requesting __GFP_NOFAIL allocation although it can
return ENOMEM without any problems obviously (t4_l2t_set_switching does
that already). So the non-failing requirement is too strong without any
obvious reason. Drop __GFP_NOFAIL and reorganize the code to have the
failure paths easier.

The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL
and then checks the return value and returns -ENOMEM on failure. This
doesn't make any sense what so ever. Either the allocation cannot fail
or it can.

del_filter_wr seems to be safe as well because the filter entry is not
marked as pending and the return value is propagated up the stack up to
c4iw_destroy_listen.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 drivers/infiniband/hw/cxgb4/mem.c               |  2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index cb43c2299ac0..81b028eaf1f5 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -73,7 +73,7 @@ static int _c4iw_write_mem_dma_aligned(struct c4iw_rdev *rdev, u32 addr,
 		c4iw_init_wr_wait(&wr_wait);
 	wr_len = roundup(sizeof(*req) + sizeof(*sgl), 16);
 
-	skb = alloc_skb(wr_len, GFP_KERNEL | __GFP_NOFAIL);
+	skb = alloc_skb(wr_len, GFP_KERNEL);
 	if (!skb)
 		return -ENOMEM;
 	set_wr_txq(skb, CPL_PRIORITY_CONTROL, 0);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index ccf3436024bc..f351920fc293 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1220,6 +1220,10 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
 	struct fw_filter_wr *fwr;
 	unsigned int ftid;
 
+	skb = alloc_skb(sizeof(*fwr), GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
 	/* If the new filter requires loopback Destination MAC and/or VLAN
 	 * rewriting then we need to allocate a Layer 2 Table (L2T) entry for
 	 * the filter.
@@ -1227,19 +1231,21 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
 	if (f->fs.newdmac || f->fs.newvlan) {
 		/* allocate L2T entry for new filter */
 		f->l2t = t4_l2t_alloc_switching(adapter->l2t);
-		if (f->l2t == NULL)
+		if (f->l2t == NULL) {
+			kfree(skb);
 			return -EAGAIN;
+		}
 		if (t4_l2t_set_switching(adapter, f->l2t, f->fs.vlan,
 					f->fs.eport, f->fs.dmac)) {
 			cxgb4_l2t_release(f->l2t);
 			f->l2t = NULL;
+			kfree(skb);
 			return -ENOMEM;
 		}
 	}
 
 	ftid = adapter->tids.ftid_base + fidx;
 
-	skb = alloc_skb(sizeof(*fwr), GFP_KERNEL | __GFP_NOFAIL);
 	fwr = (struct fw_filter_wr *)__skb_put(skb, sizeof(*fwr));
 	memset(fwr, 0, sizeof(*fwr));
 
@@ -1337,7 +1343,10 @@ static int del_filter_wr(struct adapter *adapter, int fidx)
 	len = sizeof(*fwr);
 	ftid = adapter->tids.ftid_base + fidx;
 
-	skb = alloc_skb(len, GFP_KERNEL | __GFP_NOFAIL);
+	skb = alloc_skb(len, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
 	fwr = (struct fw_filter_wr *)__skb_put(skb, len);
 	t4_mk_filtdelwr(ftid, fwr, adapter->sge.fw_evtq.abs_id);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement
  2015-03-02 13:54 ` [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement Michal Hocko
@ 2015-03-02 20:04   ` David Miller
  2015-03-02 20:33     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2015-03-02 20:04 UTC (permalink / raw)
  To: mhocko
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	penguin-kernel, sparclinux, vipul, netdev, linux-kernel

From: Michal Hocko <mhocko@suse.cz>
Date: Mon,  2 Mar 2015 14:54:42 +0100

> mdesc_kmalloc is currently requiring __GFP_NOFAIL allocation although it
> seems that the allocation failure is handled by all callers (via
> mdesc_alloc). __GFP_NOFAIL is a strong liability for the memory
> allocator and so the users are discouraged to use the flag unless the
> allocation failure is really a nogo. Drop the flag here as this doesn't
> seem to be the case.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

It is a serious failure.

If we miss an MDESC update due to this allocation failure, the update
is not an event which gets retransmitted so we will lose the updated
machine description forever.

We really need this allocation to succeed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement
  2015-03-02 20:04   ` David Miller
@ 2015-03-02 20:33     ` Michal Hocko
  2015-03-02 20:44       ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 20:33 UTC (permalink / raw)
  To: David Miller
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	penguin-kernel, sparclinux, vipul, netdev, linux-kernel

On Mon 02-03-15 15:04:05, David S. Miller wrote:
> From: Michal Hocko <mhocko@suse.cz>
> Date: Mon,  2 Mar 2015 14:54:42 +0100
> 
> > mdesc_kmalloc is currently requiring __GFP_NOFAIL allocation although it
> > seems that the allocation failure is handled by all callers (via
> > mdesc_alloc). __GFP_NOFAIL is a strong liability for the memory
> > allocator and so the users are discouraged to use the flag unless the
> > allocation failure is really a nogo. Drop the flag here as this doesn't
> > seem to be the case.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> It is a serious failure.
> 
> If we miss an MDESC update due to this allocation failure, the update
> is not an event which gets retransmitted so we will lose the updated
> machine description forever.
> 
> We really need this allocation to succeed.

OK, thanks for the clarification. This wasn't clear from the commit
which has introduced this code. I will drop this patch. Would you
accept something like the following instead?
---
diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 99632a87e697..26c80e18d7b1 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -130,26 +130,26 @@ static struct mdesc_mem_ops memblock_mdesc_ops = {
 static struct mdesc_handle *mdesc_kmalloc(unsigned int mdesc_size)
 {
 	unsigned int handle_size;
+	struct mdesc_handle *hp;
+	unsigned long addr;
 	void *base;
 
 	handle_size = (sizeof(struct mdesc_handle) -
 		       sizeof(struct mdesc_hdr) +
 		       mdesc_size);
 
+	/*
+	 * Allocation has to succeed because mdesc update would be missed
+	 * and such events are not retransmitted.
+	 */
 	base = kmalloc(handle_size + 15, GFP_KERNEL | __GFP_NOFAIL);
-	if (base) {
-		struct mdesc_handle *hp;
-		unsigned long addr;
-
-		addr = (unsigned long)base;
-		addr = (addr + 15UL) & ~15UL;
-		hp = (struct mdesc_handle *) addr;
+	addr = (unsigned long)base;
+	addr = (addr + 15UL) & ~15UL;
+	hp = (struct mdesc_handle *) addr;
 
-		mdesc_handle_init(hp, handle_size, base);
-		return hp;
-	}
+	mdesc_handle_init(hp, handle_size, base);
 
-	return NULL;
+	return hp;
 }
 
 static void mdesc_kfree(struct mdesc_handle *hp)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL
  2015-03-02 13:54 ` [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL Michal Hocko
@ 2015-03-02 20:33   ` David Rientjes
  2015-03-02 21:42     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: David Rientjes @ 2015-03-02 20:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

On Mon, 2 Mar 2015, Michal Hocko wrote:

> This basically reverts 47def82672b3 (jbd2: Remove __GFP_NOFAIL from jbd2
> layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
> to open coding the endless loop around the allocator rather than
> removing the dependency on the non failing allocation. So the
> deprecation was a clear failure and the reality tells us that
> __GFP_NOFAIL is not even close to go away.
> 
> It is still true that __GFP_NOFAIL allocations are generally discouraged
> and new uses should be evaluated and an alternative (pre-allocations or
> reservations) should be considered but it doesn't make any sense to lie
> the allocator about the requirements. Allocator can take steps to help
> making a progress if it knows the requirements.
> 

The changelog should state that this only changes the source code, there 
is no functional change since alloc_buffer_head() and 
kmem_cache_zalloc(transaction_cache) are already implicitly nofail due to 
the allocation order.  The failure code added by the commit you cite are 
never executed.

I agree that if the implementation of the page allocator were to change 
with respect to PAGE_ALLOC_COSTLY_ORDER that we'd need __GFP_NOFAIL and 
that such an allocation is better handled in the page allocator.

> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

GFP_NOFS|__GFP_NOFAIL is scary.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status
  2015-03-02 13:54 ` [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status Michal Hocko
@ 2015-03-02 20:34   ` David Rientjes
  0 siblings, 0 replies; 15+ messages in thread
From: David Rientjes @ 2015-03-02 20:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

On Mon, 2 Mar 2015, Michal Hocko wrote:

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 51bd1e72a917..0cf9c2772988 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -57,8 +57,10 @@ struct vm_area_struct;
>   * _might_ fail.  This depends upon the particular VM implementation.
>   *
>   * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
> - * cannot handle allocation failures.  This modifier is deprecated and no new
> - * users should be added.
> + * cannot handle allocation failures. New users should be evaluated carefuly
> + * (and the flag should be used only when there is no reasonable failure policy)
> + * but it is definitely preferable to use the flag rather than opencode endless
> + * loop around allocator.
>   *
>   * __GFP_NORETRY: The VM implementation must not retry indefinitely.
>   *

s/carefuly/carefully/

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement
  2015-03-02 20:33     ` Michal Hocko
@ 2015-03-02 20:44       ` David Miller
  2015-03-02 21:36         ` [PATCH] sparc: clarify __GFP_NOFAIL allocation Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2015-03-02 20:44 UTC (permalink / raw)
  To: mhocko
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	penguin-kernel, sparclinux, vipul, netdev, linux-kernel

From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 2 Mar 2015 21:33:04 +0100

> On Mon 02-03-15 15:04:05, David S. Miller wrote:
>> From: Michal Hocko <mhocko@suse.cz>
>> Date: Mon,  2 Mar 2015 14:54:42 +0100
>> 
>> > mdesc_kmalloc is currently requiring __GFP_NOFAIL allocation although it
>> > seems that the allocation failure is handled by all callers (via
>> > mdesc_alloc). __GFP_NOFAIL is a strong liability for the memory
>> > allocator and so the users are discouraged to use the flag unless the
>> > allocation failure is really a nogo. Drop the flag here as this doesn't
>> > seem to be the case.
>> > 
>> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
>> 
>> It is a serious failure.
>> 
>> If we miss an MDESC update due to this allocation failure, the update
>> is not an event which gets retransmitted so we will lose the updated
>> machine description forever.
>> 
>> We really need this allocation to succeed.
> 
> OK, thanks for the clarification. This wasn't clear from the commit
> which has introduced this code. I will drop this patch. Would you
> accept something like the following instead?

Sure.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] sparc: clarify __GFP_NOFAIL allocation
  2015-03-02 20:44       ` David Miller
@ 2015-03-02 21:36         ` Michal Hocko
  2015-03-02 21:45           ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 21:36 UTC (permalink / raw)
  To: David Miller
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	penguin-kernel, sparclinux, vipul, netdev, linux-kernel

On Mon 02-03-15 15:44:24, David S. Miller wrote:
[...]
> > OK, thanks for the clarification. This wasn't clear from the commit
> > which has introduced this code. I will drop this patch. Would you
> > accept something like the following instead?
> 
> Sure.

Thanks!

---
>From dac5829e3a1d44ba7759b4188de01f15ddb77b8b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 2 Mar 2015 22:27:02 +0100
Subject: [PATCH] sparc: clarify __GFP_NOFAIL allocation

920c3ed74134 ([SPARC64]: Add basic infrastructure for MD add/remove
notification.) has added __GFP_NOFAIL for the allocation request but
it hasn't mentioned why is this strict requirement really needed.
The code was handling an allocation failure and propagated it properly
up the callchain so it is not clear why it is needed.

Dave has clarified the intention when I tried to remove the flag as not
being necessary:
"
It is a serious failure.

If we miss an MDESC update due to this allocation failure, the update
is not an event which gets retransmitted so we will lose the updated
machine description forever.

We really need this allocation to succeed.
"

So add a comment to clarify the nofail flag and get rid of the failure
check because __GFP_NOFAIL allocation doesn't fail.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 arch/sparc/kernel/mdesc.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 99632a87e697..26c80e18d7b1 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -130,26 +130,26 @@ static struct mdesc_mem_ops memblock_mdesc_ops = {
 static struct mdesc_handle *mdesc_kmalloc(unsigned int mdesc_size)
 {
 	unsigned int handle_size;
+	struct mdesc_handle *hp;
+	unsigned long addr;
 	void *base;
 
 	handle_size = (sizeof(struct mdesc_handle) -
 		       sizeof(struct mdesc_hdr) +
 		       mdesc_size);
 
+	/*
+	 * Allocation has to succeed because mdesc update would be missed
+	 * and such events are not retransmitted.
+	 */
 	base = kmalloc(handle_size + 15, GFP_KERNEL | __GFP_NOFAIL);
-	if (base) {
-		struct mdesc_handle *hp;
-		unsigned long addr;
-
-		addr = (unsigned long)base;
-		addr = (addr + 15UL) & ~15UL;
-		hp = (struct mdesc_handle *) addr;
+	addr = (unsigned long)base;
+	addr = (addr + 15UL) & ~15UL;
+	hp = (struct mdesc_handle *) addr;
 
-		mdesc_handle_init(hp, handle_size, base);
-		return hp;
-	}
+	mdesc_handle_init(hp, handle_size, base);
 
-	return NULL;
+	return hp;
 }
 
 static void mdesc_kfree(struct mdesc_handle *hp)
-- 
2.1.4


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL
  2015-03-02 20:33   ` David Rientjes
@ 2015-03-02 21:42     ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2015-03-02 21:42 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, Andrew Morton, Johannes Weiner, Dave Chinner,
	Theodore Ts'o, Mel Gorman, Tetsuo Handa, David S. Miller,
	sparclinux, Vipul Pandya, netdev, LKML

On Mon 02-03-15 12:33:21, David Rientjes wrote:
> On Mon, 2 Mar 2015, Michal Hocko wrote:
> 
> > This basically reverts 47def82672b3 (jbd2: Remove __GFP_NOFAIL from jbd2
> > layer). The deprecation of __GFP_NOFAIL was a bad choice because it led
> > to open coding the endless loop around the allocator rather than
> > removing the dependency on the non failing allocation. So the
> > deprecation was a clear failure and the reality tells us that
> > __GFP_NOFAIL is not even close to go away.
> > 
> > It is still true that __GFP_NOFAIL allocations are generally discouraged
> > and new uses should be evaluated and an alternative (pre-allocations or
> > reservations) should be considered but it doesn't make any sense to lie
> > the allocator about the requirements. Allocator can take steps to help
> > making a progress if it knows the requirements.
> > 
> 
> The changelog should state that this only changes the source code, there 
> is no functional change since alloc_buffer_head() and 
> kmem_cache_zalloc(transaction_cache) are already implicitly nofail due to 
> the allocation order.  The failure code added by the commit you cite are 
> never executed.

Well, even when those allocation would fail the resulting behavior is
basically the same (modulo congestion_wait which imho doesn't make much
difference). So I would prefer not getting that way and simply stay with
the external loop vs. looping within the allocator.

> I agree that if the implementation of the page allocator were to change 
> with respect to PAGE_ALLOC_COSTLY_ORDER that we'd need __GFP_NOFAIL and 
> that such an allocation is better handled in the page allocator.
> 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Acked-by: David Rientjes <rientjes@google.com>

Thanks

> GFP_NOFS|__GFP_NOFAIL is scary.

Yes it is but as I've learned nothing unusual in the fs land and the
situation should be improved a lot if we go reservation way suggested by
David. Then __GFP_NOFAIL would consume the pre-reserved memory rather
than trigger OOM killer.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sparc: clarify __GFP_NOFAIL allocation
  2015-03-02 21:36         ` [PATCH] sparc: clarify __GFP_NOFAIL allocation Michal Hocko
@ 2015-03-02 21:45           ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2015-03-02 21:45 UTC (permalink / raw)
  To: mhocko
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	penguin-kernel, sparclinux, vipul, netdev, linux-kernel

From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 2 Mar 2015 22:36:10 +0100

> 920c3ed74134 ([SPARC64]: Add basic infrastructure for MD add/remove
> notification.) has added __GFP_NOFAIL for the allocation request but
> it hasn't mentioned why is this strict requirement really needed.
> The code was handling an allocation failure and propagated it properly
> up the callchain so it is not clear why it is needed.
> 
> Dave has clarified the intention when I tried to remove the flag as not
> being necessary:
> "
> It is a serious failure.
> 
> If we miss an MDESC update due to this allocation failure, the update
> is not an event which gets retransmitted so we will lose the updated
> machine description forever.
> 
> We really need this allocation to succeed.
> "
> 
> So add a comment to clarify the nofail flag and get rid of the failure
> check because __GFP_NOFAIL allocation doesn't fail.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 4/4] cxgb4: drop __GFP_NOFAIL allocation
  2015-03-02 13:54 ` [RFC 4/4] cxgb4: drop " Michal Hocko
@ 2015-03-03 12:22   ` Tetsuo Handa
  2015-03-03 13:18     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2015-03-03 12:22 UTC (permalink / raw)
  To: mhocko, davem
  Cc: linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	sparclinux, vipul, netdev, linux-kernel

Michal Hocko wrote:
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index ccf3436024bc..f351920fc293 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -1220,6 +1220,10 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
>  	struct fw_filter_wr *fwr;
>  	unsigned int ftid;
>  
> +	skb = alloc_skb(sizeof(*fwr), GFP_KERNEL);
> +	if (!skb)
> +		return -ENOMEM;
> +
>  	/* If the new filter requires loopback Destination MAC and/or VLAN
>  	 * rewriting then we need to allocate a Layer 2 Table (L2T) entry for
>  	 * the filter.
> @@ -1227,19 +1231,21 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
>  	if (f->fs.newdmac || f->fs.newvlan) {
>  		/* allocate L2T entry for new filter */
>  		f->l2t = t4_l2t_alloc_switching(adapter->l2t);
> -		if (f->l2t == NULL)
> +		if (f->l2t == NULL) {
> +			kfree(skb);

I think we need to use kfree_skb() than kfree() for memory allocated by alloc_skb().

>  			return -EAGAIN;
> +		}
>  		if (t4_l2t_set_switching(adapter, f->l2t, f->fs.vlan,
>  					f->fs.eport, f->fs.dmac)) {
>  			cxgb4_l2t_release(f->l2t);
>  			f->l2t = NULL;
> +			kfree(skb);

Ditto.

>  			return -ENOMEM;
>  		}
>  	}
>  
>  	ftid = adapter->tids.ftid_base + fidx;
>  
> -	skb = alloc_skb(sizeof(*fwr), GFP_KERNEL | __GFP_NOFAIL);
>  	fwr = (struct fw_filter_wr *)__skb_put(skb, sizeof(*fwr));
>  	memset(fwr, 0, sizeof(*fwr));
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 4/4] cxgb4: drop __GFP_NOFAIL allocation
  2015-03-03 12:22   ` Tetsuo Handa
@ 2015-03-03 13:18     ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2015-03-03 13:18 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: davem, linux-mm, akpm, hannes, rientjes, david, tytso, mgorman,
	sparclinux, vipul, netdev, linux-kernel

On Tue 03-03-15 21:22:22, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > index ccf3436024bc..f351920fc293 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > @@ -1220,6 +1220,10 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
> >  	struct fw_filter_wr *fwr;
> >  	unsigned int ftid;
> >  
> > +	skb = alloc_skb(sizeof(*fwr), GFP_KERNEL);
> > +	if (!skb)
> > +		return -ENOMEM;
> > +
> >  	/* If the new filter requires loopback Destination MAC and/or VLAN
> >  	 * rewriting then we need to allocate a Layer 2 Table (L2T) entry for
> >  	 * the filter.
> > @@ -1227,19 +1231,21 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
> >  	if (f->fs.newdmac || f->fs.newvlan) {
> >  		/* allocate L2T entry for new filter */
> >  		f->l2t = t4_l2t_alloc_switching(adapter->l2t);
> > -		if (f->l2t == NULL)
> > +		if (f->l2t == NULL) {
> > +			kfree(skb);
> 
> I think we need to use kfree_skb() than kfree() for memory allocated by alloc_skb().

Definitely! Good point, thanks!

Andrew, I've noticed you have picked up the patch. Should I resend or
the below incremental one is good enough?
---
>From eee05867dd55efa236ba065a3e7b7eb6e6ea511d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Tue, 3 Mar 2015 14:16:30 +0100
Subject: [PATCH] mmotm: cxgb4-drop-__gfp_nofail-allocation-fix

Use kfree_skb instead of kfree because the allocation is done by
alloc_skb.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index f351920fc293..e1e5028a714c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1232,14 +1232,14 @@ static int set_filter_wr(struct adapter *adapter, int fidx)
 		/* allocate L2T entry for new filter */
 		f->l2t = t4_l2t_alloc_switching(adapter->l2t);
 		if (f->l2t == NULL) {
-			kfree(skb);
+			kfree_skb(skb);
 			return -EAGAIN;
 		}
 		if (t4_l2t_set_switching(adapter, f->l2t, f->fs.vlan,
 					f->fs.eport, f->fs.dmac)) {
 			cxgb4_l2t_release(f->l2t);
 			f->l2t = NULL;
-			kfree(skb);
+			kfree_skb(skb);
 			return -ENOMEM;
 		}
 	}
-- 
2.1.4


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-03-03 13:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-02 13:54 [RFC PATCH 0/4] Clarify and cleanup some __GFP_NOFAIL usage Michal Hocko
2015-03-02 13:54 ` [RFC 1/4] mm: Clarify __GFP_NOFAIL deprecation status Michal Hocko
2015-03-02 20:34   ` David Rientjes
2015-03-02 13:54 ` [RFC 2/4] jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL Michal Hocko
2015-03-02 20:33   ` David Rientjes
2015-03-02 21:42     ` Michal Hocko
2015-03-02 13:54 ` [RFC 3/4] sparc: remove __GFP_NOFAIL reuquirement Michal Hocko
2015-03-02 20:04   ` David Miller
2015-03-02 20:33     ` Michal Hocko
2015-03-02 20:44       ` David Miller
2015-03-02 21:36         ` [PATCH] sparc: clarify __GFP_NOFAIL allocation Michal Hocko
2015-03-02 21:45           ` David Miller
2015-03-02 13:54 ` [RFC 4/4] cxgb4: drop " Michal Hocko
2015-03-03 12:22   ` Tetsuo Handa
2015-03-03 13:18     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).