All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-05  9:51 ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

Hi,
small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
traditionally even though their reclaim capabilities are restricted
because the VM code cannot recurse into filesystems to clean dirty
pages. At the same time these allocation requests do not allow to
trigger the OOM killer because that would lead to pre-mature OOM killing
during heavy fs metadata workloads.

This leaves the VM code in an unfortunate situation where GFP_NOFS
requests is looping inside the allocator relying on somebody else to
make a progress on its behalf. This is prone to deadlocks when the
request is holding resources which are necessary for other task to make
a progress and release memory (e.g. OOM victim is blocked on the lock
held by the NONFS request). Another drawback is that the caller of
the allocator cannot define any fallback strategy because the request
doesn't fail.

As the VM cannot do much about these requests we should face the reality
and allow those allocations to fail. Johannes has already posted the
patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
but the discussion died pretty quickly.

I was playing with this patch and xfs, ext[34] and btrfs for a while to
see what is the effect under heavy memory pressure. As expected this led
to some fallouts.
My test consisted of a simple memory hog which allocates a lot of
anonymous memory and writes to a fs mainly to trigger a fs activity on
exit. In parallel there is a parallel fs metadata load (multiple tasks
creating thousands of empty files and directories). All is running
in a VM with small amount of memory to emulate an under provisioned
system. The metadata load is triggering a sufficient load to invoke the
direct reclaim even without the memory hog. The memory hog forks several
tasks sharing the VM and OOM killer manages to kill it without locking
up the system (this was based on the test case from Tetsuo Handa -
http://www.spinics.net/lists/linux-fsdevel/msg82958.html - I just didn't
want to kill my machine ;)).
With all the patches applied none of the 4 filesystems gets aborted
transactions and RO remount (well xfs didn't need any special
treatment). This is obviously not sufficient to claim that failing
GFP_NOFS is OK now but I think it is a good start for the further
discussion. I would be grateful if FS people could have a look at those
patches.  I have simply used __GFP_NOFAIL in the critical paths. This
might be not the best strategy but it sounds like a good first step.

The first patch in the series also allows __GFP_NOFAIL allocations to
access memory reserves when the system is OOM which should help those
requests to make a forward progress - especially in combination with
GFP_NOFS.

The second patch tries to address a potential pre-mature OOM killer from
the page fault path. I have posted it separately but it didn't get much
traction.

The third patch allows GFP_NOFS to fail and I believe it should see much
more testing coverage. It would be really great if it could sit in the
mmotm tree for few release cycles so that we can catch more fallouts.

The rest are the FS specific patches to fortify allocations
requests which are really needed to finish transactions without RO
remounts. There might be more needed but my test case survives with
these in place.
They would obviously need some rewording if they are going to be applied
even without Patch3 and I will do that if respective maintainers will
take them. Ext3 and JBD are going away soon so they might be dropped but
they have been in the tree while I was testing so I've kept them.

Thoughts? Opinions?


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-05  9:51 ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

Hi,
small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
traditionally even though their reclaim capabilities are restricted
because the VM code cannot recurse into filesystems to clean dirty
pages. At the same time these allocation requests do not allow to
trigger the OOM killer because that would lead to pre-mature OOM killing
during heavy fs metadata workloads.

This leaves the VM code in an unfortunate situation where GFP_NOFS
requests is looping inside the allocator relying on somebody else to
make a progress on its behalf. This is prone to deadlocks when the
request is holding resources which are necessary for other task to make
a progress and release memory (e.g. OOM victim is blocked on the lock
held by the NONFS request). Another drawback is that the caller of
the allocator cannot define any fallback strategy because the request
doesn't fail.

As the VM cannot do much about these requests we should face the reality
and allow those allocations to fail. Johannes has already posted the
patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
but the discussion died pretty quickly.

I was playing with this patch and xfs, ext[34] and btrfs for a while to
see what is the effect under heavy memory pressure. As expected this led
to some fallouts.
My test consisted of a simple memory hog which allocates a lot of
anonymous memory and writes to a fs mainly to trigger a fs activity on
exit. In parallel there is a parallel fs metadata load (multiple tasks
creating thousands of empty files and directories). All is running
in a VM with small amount of memory to emulate an under provisioned
system. The metadata load is triggering a sufficient load to invoke the
direct reclaim even without the memory hog. The memory hog forks several
tasks sharing the VM and OOM killer manages to kill it without locking
up the system (this was based on the test case from Tetsuo Handa -
http://www.spinics.net/lists/linux-fsdevel/msg82958.html - I just didn't
want to kill my machine ;)).
With all the patches applied none of the 4 filesystems gets aborted
transactions and RO remount (well xfs didn't need any special
treatment). This is obviously not sufficient to claim that failing
GFP_NOFS is OK now but I think it is a good start for the further
discussion. I would be grateful if FS people could have a look at those
patches.  I have simply used __GFP_NOFAIL in the critical paths. This
might be not the best strategy but it sounds like a good first step.

The first patch in the series also allows __GFP_NOFAIL allocations to
access memory reserves when the system is OOM which should help those
requests to make a forward progress - especially in combination with
GFP_NOFS.

The second patch tries to address a potential pre-mature OOM killer from
the page fault path. I have posted it separately but it didn't get much
traction.

The third patch allows GFP_NOFS to fail and I believe it should see much
more testing coverage. It would be really great if it could sit in the
mmotm tree for few release cycles so that we can catch more fallouts.

The rest are the FS specific patches to fortify allocations
requests which are really needed to finish transactions without RO
remounts. There might be more needed but my test case survives with
these in place.
They would obviously need some rewording if they are going to be applied
even without Patch3 and I will do that if respective maintainers will
take them. Ext3 and JBD are going away soon so they might be dropped but
they have been in the tree while I was testing so I've kept them.

Thoughts? Opinions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-05  9:51 ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

Hi,
small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
traditionally even though their reclaim capabilities are restricted
because the VM code cannot recurse into filesystems to clean dirty
pages. At the same time these allocation requests do not allow to
trigger the OOM killer because that would lead to pre-mature OOM killing
during heavy fs metadata workloads.

This leaves the VM code in an unfortunate situation where GFP_NOFS
requests is looping inside the allocator relying on somebody else to
make a progress on its behalf. This is prone to deadlocks when the
request is holding resources which are necessary for other task to make
a progress and release memory (e.g. OOM victim is blocked on the lock
held by the NONFS request). Another drawback is that the caller of
the allocator cannot define any fallback strategy because the request
doesn't fail.

As the VM cannot do much about these requests we should face the reality
and allow those allocations to fail. Johannes has already posted the
patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
but the discussion died pretty quickly.

I was playing with this patch and xfs, ext[34] and btrfs for a while to
see what is the effect under heavy memory pressure. As expected this led
to some fallouts.
My test consisted of a simple memory hog which allocates a lot of
anonymous memory and writes to a fs mainly to trigger a fs activity on
exit. In parallel there is a parallel fs metadata load (multiple tasks
creating thousands of empty files and directories). All is running
in a VM with small amount of memory to emulate an under provisioned
system. The metadata load is triggering a sufficient load to invoke the
direct reclaim even without the memory hog. The memory hog forks several
tasks sharing the VM and OOM killer manages to kill it without locking
up the system (this was based on the test case from Tetsuo Handa -
http://www.spinics.net/lists/linux-fsdevel/msg82958.html - I just didn't
want to kill my machine ;)).
With all the patches applied none of the 4 filesystems gets aborted
transactions and RO remount (well xfs didn't need any special
treatment). This is obviously not sufficient to claim that failing
GFP_NOFS is OK now but I think it is a good start for the further
discussion. I would be grateful if FS people could have a look at those
patches.  I have simply used __GFP_NOFAIL in the critical paths. This
might be not the best strategy but it sounds like a good first step.

The first patch in the series also allows __GFP_NOFAIL allocations to
access memory reserves when the system is OOM which should help those
requests to make a forward progress - especially in combination with
GFP_NOFS.

The second patch tries to address a potential pre-mature OOM killer from
the page fault path. I have posted it separately but it didn't get much
traction.

The third patch allows GFP_NOFS to fail and I believe it should see much
more testing coverage. It would be really great if it could sit in the
mmotm tree for few release cycles so that we can catch more fallouts.

The rest are the FS specific patches to fortify allocations
requests which are really needed to finish transactions without RO
remounts. There might be more needed but my test case survives with
these in place.
They would obviously need some rewording if they are going to be applied
even without Patch3 and I will do that if respective maintainers will
take them. Ext3 and JBD are going away soon so they might be dropped but
they have been in the tree while I was testing so I've kept them.

Thoughts? Opinions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__GFP_NOFAIL is a big hammer used to ensure that the allocation
request can never fail. This is a strong requirement and as such
it also deserves a special treatment when the system is OOM. The
primary problem here is that the allocation request might have
come with some locks held and the oom victim might be blocked
on the same locks. This is basically an OOM deadlock situation.

This patch tries to reduce the risk of such a deadlocks by giving
__GFP_NOFAIL allocations a special treatment and let them dive into
memory reserves after oom killer invocation. This should help them
to make a progress and release resources they are holding. The OOM
victim should compensate for the reserves consumption.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f9ffbb087cb..ee69c338ca2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2732,8 +2732,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 	/* Exhausted what can be done so it's blamo time */
 	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
-			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
+			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
 		*did_some_progress = 1;
+
+		if (gfp_mask & __GFP_NOFAIL) {
+			page = get_page_from_freelist(gfp_mask, order,
+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
+				    " Consider increasing min_free_kbytes.\n");
+		}
+	}
 out:
 	mutex_unlock(&oom_lock);
 	return page;
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__GFP_NOFAIL is a big hammer used to ensure that the allocation
request can never fail. This is a strong requirement and as such
it also deserves a special treatment when the system is OOM. The
primary problem here is that the allocation request might have
come with some locks held and the oom victim might be blocked
on the same locks. This is basically an OOM deadlock situation.

This patch tries to reduce the risk of such a deadlocks by giving
__GFP_NOFAIL allocations a special treatment and let them dive into
memory reserves after oom killer invocation. This should help them
to make a progress and release resources they are holding. The OOM
victim should compensate for the reserves consumption.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f9ffbb087cb..ee69c338ca2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2732,8 +2732,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 	/* Exhausted what can be done so it's blamo time */
 	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
-			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
+			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
 		*did_some_progress = 1;
+
+		if (gfp_mask & __GFP_NOFAIL) {
+			page = get_page_from_freelist(gfp_mask, order,
+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
+				    " Consider increasing min_free_kbytes.\n");
+		}
+	}
 out:
 	mutex_unlock(&oom_lock);
 	return page;
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

__GFP_NOFAIL is a big hammer used to ensure that the allocation
request can never fail. This is a strong requirement and as such
it also deserves a special treatment when the system is OOM. The
primary problem here is that the allocation request might have
come with some locks held and the oom victim might be blocked
on the same locks. This is basically an OOM deadlock situation.

This patch tries to reduce the risk of such a deadlocks by giving
__GFP_NOFAIL allocations a special treatment and let them dive into
memory reserves after oom killer invocation. This should help them
to make a progress and release resources they are holding. The OOM
victim should compensate for the reserves consumption.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f9ffbb087cb..ee69c338ca2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2732,8 +2732,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 	/* Exhausted what can be done so it's blamo time */
 	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
-			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
+			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL)) {
 		*did_some_progress = 1;
+
+		if (gfp_mask & __GFP_NOFAIL) {
+			page = get_page_from_freelist(gfp_mask, order,
+					ALLOC_NO_WATERMARKS|ALLOC_CPUSET, ac);
+			WARN_ONCE(!page, "Unable to fullfil gfp_nofail allocation."
+				    " Consider increasing min_free_kbytes.\n");
+		}
+	}
 out:
 	mutex_unlock(&oom_lock);
 	return page;
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
however, not called from the fs layera directly so it doesn't need this
protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers
seem to be OK because they are not taking any fs lock before invoking
generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
from the reclaim recursion POV because this lock serializes truncate
and punch hole with the page faults and it doesn't get involved in the
reclaim.

The GFP_NOFS protection might be even harmful. There is a push to fail
GFP_NOFS allocations rather than loop within allocator indefinitely with
a very limited reclaim ability. Once we start failing those requests
the OOM killer might be triggered prematurely because the page cache
allocation failure is propagated up the page fault path and end up in
pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy
wrt. parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask. The mask is also
inode proper so it would even be a layering violation. What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.

Initialize the default to (mapping_gfp_mask | GFP_IOFS) because this
should be safe from the page fault path normally. Why do we care
about mapping_gfp_mask at all then? Because this doesn't hold only
reclaim protection flags but it also might contain zone and movability
restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
those.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  4 ++++
 mm/filemap.c       |  9 ++++-----
 mm/memory.c        | 17 +++++++++++++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7f471789781a..962e37c7cd6a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -220,10 +220,14 @@ extern pgprot_t protection_map[16];
  * ->fault function. The vma's ->fault is responsible for returning a bitmask
  * of VM_FAULT_xxx flags that give details about how the fault was handled.
  *
+ * MM layer fills up gfp_mask for page allocations but fault handler might
+ * alter it if its implementation requires a different allocation context.
+ *
  * pgoff should be used in favour of virtual_address, if possible.
  */
 struct vm_fault {
 	unsigned int flags;		/* FAULT_FLAG_xxx flags */
+	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 	pgoff_t pgoff;			/* Logical page offset based on vma */
 	void __user *virtual_address;	/* Faulting virtual address */
 
diff --git a/mm/filemap.c b/mm/filemap.c
index b63fb81df336..8a16a07bbe02 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1774,19 +1774,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int page_cache_read(struct file *file, pgoff_t offset)
+static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct page *page;
 	int ret;
 
 	do {
-		page = page_cache_alloc_cold(mapping);
+		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
 		if (!page)
 			return -ENOMEM;
 
-		ret = add_to_page_cache_lru(page, mapping, offset,
-				GFP_KERNEL & mapping_gfp_mask(mapping));
+		ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL & gfp_mask);
 		if (ret == 0)
 			ret = mapping->a_ops->readpage(file, page);
 		else if (ret == -EEXIST)
@@ -1969,7 +1968,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * We're only likely to ever get here if MADV_RANDOM is in
 	 * effect.
 	 */
-	error = page_cache_read(file, offset);
+	error = page_cache_read(file, offset, vmf->gfp_mask);
 
 	/*
 	 * The page we want has now been added to the page cache.
diff --git a/mm/memory.c b/mm/memory.c
index 8a2fc9945b46..25ab29560dca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1949,6 +1949,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 		copy_user_highpage(dst, src, va, vma);
 }
 
+static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
+{
+	struct file *vm_file = vma->vm_file;
+
+	if (vm_file)
+		return mapping_gfp_mask(vm_file->f_mapping) | GFP_IOFS;
+
+	/*
+	 * Special mappings (e.g. VDSO) do not have any file so fake
+	 * a default GFP_KERNEL for them.
+	 */
+	return GFP_KERNEL;
+}
+
 /*
  * Notify the address space that the page is about to become writable so that
  * it can prohibit this or wait for the page to get into an appropriate state.
@@ -1964,6 +1978,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
 	vmf.pgoff = page->index;
 	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.page = page;
 	vmf.cow_page = NULL;
 
@@ -2763,6 +2778,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.flags = flags;
 	vmf.page = NULL;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.cow_page = cow_page;
 
 	ret = vma->vm_ops->fault(vma, &vmf);
@@ -2929,6 +2945,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.max_pgoff = max_pgoff;
 	vmf.flags = flags;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vma->vm_ops->map_pages(vma, &vmf);
 }
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
however, not called from the fs layera directly so it doesn't need this
protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers
seem to be OK because they are not taking any fs lock before invoking
generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
from the reclaim recursion POV because this lock serializes truncate
and punch hole with the page faults and it doesn't get involved in the
reclaim.

The GFP_NOFS protection might be even harmful. There is a push to fail
GFP_NOFS allocations rather than loop within allocator indefinitely with
a very limited reclaim ability. Once we start failing those requests
the OOM killer might be triggered prematurely because the page cache
allocation failure is propagated up the page fault path and end up in
pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy
wrt. parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask. The mask is also
inode proper so it would even be a layering violation. What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.

Initialize the default to (mapping_gfp_mask | GFP_IOFS) because this
should be safe from the page fault path normally. Why do we care
about mapping_gfp_mask at all then? Because this doesn't hold only
reclaim protection flags but it also might contain zone and movability
restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
those.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  4 ++++
 mm/filemap.c       |  9 ++++-----
 mm/memory.c        | 17 +++++++++++++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7f471789781a..962e37c7cd6a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -220,10 +220,14 @@ extern pgprot_t protection_map[16];
  * ->fault function. The vma's ->fault is responsible for returning a bitmask
  * of VM_FAULT_xxx flags that give details about how the fault was handled.
  *
+ * MM layer fills up gfp_mask for page allocations but fault handler might
+ * alter it if its implementation requires a different allocation context.
+ *
  * pgoff should be used in favour of virtual_address, if possible.
  */
 struct vm_fault {
 	unsigned int flags;		/* FAULT_FLAG_xxx flags */
+	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 	pgoff_t pgoff;			/* Logical page offset based on vma */
 	void __user *virtual_address;	/* Faulting virtual address */
 
diff --git a/mm/filemap.c b/mm/filemap.c
index b63fb81df336..8a16a07bbe02 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1774,19 +1774,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int page_cache_read(struct file *file, pgoff_t offset)
+static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct page *page;
 	int ret;
 
 	do {
-		page = page_cache_alloc_cold(mapping);
+		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
 		if (!page)
 			return -ENOMEM;
 
-		ret = add_to_page_cache_lru(page, mapping, offset,
-				GFP_KERNEL & mapping_gfp_mask(mapping));
+		ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL & gfp_mask);
 		if (ret == 0)
 			ret = mapping->a_ops->readpage(file, page);
 		else if (ret == -EEXIST)
@@ -1969,7 +1968,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * We're only likely to ever get here if MADV_RANDOM is in
 	 * effect.
 	 */
-	error = page_cache_read(file, offset);
+	error = page_cache_read(file, offset, vmf->gfp_mask);
 
 	/*
 	 * The page we want has now been added to the page cache.
diff --git a/mm/memory.c b/mm/memory.c
index 8a2fc9945b46..25ab29560dca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1949,6 +1949,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 		copy_user_highpage(dst, src, va, vma);
 }
 
+static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
+{
+	struct file *vm_file = vma->vm_file;
+
+	if (vm_file)
+		return mapping_gfp_mask(vm_file->f_mapping) | GFP_IOFS;
+
+	/*
+	 * Special mappings (e.g. VDSO) do not have any file so fake
+	 * a default GFP_KERNEL for them.
+	 */
+	return GFP_KERNEL;
+}
+
 /*
  * Notify the address space that the page is about to become writable so that
  * it can prohibit this or wait for the page to get into an appropriate state.
@@ -1964,6 +1978,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
 	vmf.pgoff = page->index;
 	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.page = page;
 	vmf.cow_page = NULL;
 
@@ -2763,6 +2778,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.flags = flags;
 	vmf.page = NULL;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.cow_page = cow_page;
 
 	ret = vma->vm_ops->fault(vma, &vmf);
@@ -2929,6 +2945,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.max_pgoff = max_pgoff;
 	vmf.flags = flags;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vma->vm_ops->map_pages(vma, &vmf);
 }
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is,
however, not called from the fs layera directly so it doesn't need this
protection normally.

ceph and ocfs2 which call filemap_fault from their fault handlers
seem to be OK because they are not taking any fs lock before invoking
generic implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe
from the reclaim recursion POV because this lock serializes truncate
and punch hole with the page faults and it doesn't get involved in the
reclaim.

The GFP_NOFS protection might be even harmful. There is a push to fail
GFP_NOFS allocations rather than loop within allocator indefinitely with
a very limited reclaim ability. Once we start failing those requests
the OOM killer might be triggered prematurely because the page cache
allocation failure is propagated up the page fault path and end up in
pagefault_out_of_memory.

We cannot play with mapping_gfp_mask directly because that would be racy
wrt. parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask. The mask is also
inode proper so it would even be a layering violation. What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.

Initialize the default to (mapping_gfp_mask | GFP_IOFS) because this
should be safe from the page fault path normally. Why do we care
about mapping_gfp_mask at all then? Because this doesn't hold only
reclaim protection flags but it also might contain zone and movability
restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have to respect
those.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  4 ++++
 mm/filemap.c       |  9 ++++-----
 mm/memory.c        | 17 +++++++++++++++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7f471789781a..962e37c7cd6a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -220,10 +220,14 @@ extern pgprot_t protection_map[16];
  * ->fault function. The vma's ->fault is responsible for returning a bitmask
  * of VM_FAULT_xxx flags that give details about how the fault was handled.
  *
+ * MM layer fills up gfp_mask for page allocations but fault handler might
+ * alter it if its implementation requires a different allocation context.
+ *
  * pgoff should be used in favour of virtual_address, if possible.
  */
 struct vm_fault {
 	unsigned int flags;		/* FAULT_FLAG_xxx flags */
+	gfp_t gfp_mask;			/* gfp mask to be used for allocations */
 	pgoff_t pgoff;			/* Logical page offset based on vma */
 	void __user *virtual_address;	/* Faulting virtual address */
 
diff --git a/mm/filemap.c b/mm/filemap.c
index b63fb81df336..8a16a07bbe02 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1774,19 +1774,18 @@ EXPORT_SYMBOL(generic_file_read_iter);
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int page_cache_read(struct file *file, pgoff_t offset)
+static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct page *page;
 	int ret;
 
 	do {
-		page = page_cache_alloc_cold(mapping);
+		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
 		if (!page)
 			return -ENOMEM;
 
-		ret = add_to_page_cache_lru(page, mapping, offset,
-				GFP_KERNEL & mapping_gfp_mask(mapping));
+		ret = add_to_page_cache_lru(page, mapping, offset, GFP_KERNEL & gfp_mask);
 		if (ret == 0)
 			ret = mapping->a_ops->readpage(file, page);
 		else if (ret == -EEXIST)
@@ -1969,7 +1968,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * We're only likely to ever get here if MADV_RANDOM is in
 	 * effect.
 	 */
-	error = page_cache_read(file, offset);
+	error = page_cache_read(file, offset, vmf->gfp_mask);
 
 	/*
 	 * The page we want has now been added to the page cache.
diff --git a/mm/memory.c b/mm/memory.c
index 8a2fc9945b46..25ab29560dca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1949,6 +1949,20 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 		copy_user_highpage(dst, src, va, vma);
 }
 
+static gfp_t __get_fault_gfp_mask(struct vm_area_struct *vma)
+{
+	struct file *vm_file = vma->vm_file;
+
+	if (vm_file)
+		return mapping_gfp_mask(vm_file->f_mapping) | GFP_IOFS;
+
+	/*
+	 * Special mappings (e.g. VDSO) do not have any file so fake
+	 * a default GFP_KERNEL for them.
+	 */
+	return GFP_KERNEL;
+}
+
 /*
  * Notify the address space that the page is about to become writable so that
  * it can prohibit this or wait for the page to get into an appropriate state.
@@ -1964,6 +1978,7 @@ static int do_page_mkwrite(struct vm_area_struct *vma, struct page *page,
 	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
 	vmf.pgoff = page->index;
 	vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.page = page;
 	vmf.cow_page = NULL;
 
@@ -2763,6 +2778,7 @@ static int __do_fault(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.flags = flags;
 	vmf.page = NULL;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vmf.cow_page = cow_page;
 
 	ret = vma->vm_ops->fault(vma, &vmf);
@@ -2929,6 +2945,7 @@ static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 	vmf.pgoff = pgoff;
 	vmf.max_pgoff = max_pgoff;
 	vmf.flags = flags;
+	vmf.gfp_mask = __get_fault_gfp_mask(vma);
 	vma->vm_ops->map_pages(vma, &vmf);
 }
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Johannes Weiner <hannes@cmpxchg.org>

GFP_NOFS allocations are not allowed to invoke the OOM killer since
their reclaim abilities are severely diminished.  However, without the
OOM killer available there is no hope of progress once the reclaimable
pages have been exhausted.

Don't risk hanging these allocations.  Leave it to the allocation site
to implement the fallback policy for failing allocations.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee69c338ca2a..024d45d51700 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2715,15 +2715,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		if (ac->high_zoneidx < ZONE_NORMAL)
 			goto out;
 		/* The OOM killer does not compensate for IO-less reclaim */
-		if (!(gfp_mask & __GFP_FS)) {
-			/*
-			 * XXX: Page reclaim didn't yield anything,
-			 * and the OOM killer can't be invoked, but
-			 * keep looping as per tradition.
-			 */
-			*did_some_progress = 1;
+		if (!(gfp_mask & __GFP_FS))
 			goto out;
-		}
 		if (pm_suspended_storage())
 			goto out;
 		/* The OOM killer may not free memory on a specific node */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Johannes Weiner <hannes@cmpxchg.org>

GFP_NOFS allocations are not allowed to invoke the OOM killer since
their reclaim abilities are severely diminished.  However, without the
OOM killer available there is no hope of progress once the reclaimable
pages have been exhausted.

Don't risk hanging these allocations.  Leave it to the allocation site
to implement the fallback policy for failing allocations.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee69c338ca2a..024d45d51700 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2715,15 +2715,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		if (ac->high_zoneidx < ZONE_NORMAL)
 			goto out;
 		/* The OOM killer does not compensate for IO-less reclaim */
-		if (!(gfp_mask & __GFP_FS)) {
-			/*
-			 * XXX: Page reclaim didn't yield anything,
-			 * and the OOM killer can't be invoked, but
-			 * keep looping as per tradition.
-			 */
-			*did_some_progress = 1;
+		if (!(gfp_mask & __GFP_FS))
 			goto out;
-		}
 		if (pm_suspended_storage())
 			goto out;
 		/* The OOM killer may not free memory on a specific node */
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Johannes Weiner <hannes@cmpxchg.org>

GFP_NOFS allocations are not allowed to invoke the OOM killer since
their reclaim abilities are severely diminished.  However, without the
OOM killer available there is no hope of progress once the reclaimable
pages have been exhausted.

Don't risk hanging these allocations.  Leave it to the allocation site
to implement the fallback policy for failing allocations.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ee69c338ca2a..024d45d51700 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2715,15 +2715,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		if (ac->high_zoneidx < ZONE_NORMAL)
 			goto out;
 		/* The OOM killer does not compensate for IO-less reclaim */
-		if (!(gfp_mask & __GFP_FS)) {
-			/*
-			 * XXX: Page reclaim didn't yield anything,
-			 * and the OOM killer can't be invoked, but
-			 * keep looping as per tradition.
-			 */
-			*did_some_progress = 1;
+		if (!(gfp_mask & __GFP_FS))
 			goto out;
-		}
 		if (pm_suspended_storage())
 			goto out;
 		/* The OOM killer may not free memory on a specific node */
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.

jbd code has the very same issue so let's do the same there as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c  | 11 +----------
 fs/jbd2/transaction.c | 14 +++-----------
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 1695ba8334a2..bf7474deda2f 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ff2f2e6ad311..bff071e21553 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd2_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
@@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
+					    GFP_NOFS|__GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.

jbd code has the very same issue so let's do the same there as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c  | 11 +----------
 fs/jbd2/transaction.c | 14 +++-----------
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 1695ba8334a2..bf7474deda2f 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ff2f2e6ad311..bff071e21553 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd2_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
@@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
+					    GFP_NOFS|__GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.

jbd code has the very same issue so let's do the same there as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c  | 11 +----------
 fs/jbd2/transaction.c | 14 +++-----------
 2 files changed, 4 insertions(+), 21 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 1695ba8334a2..bf7474deda2f 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ff2f2e6ad311..bff071e21553 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd2_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
@@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
+					    GFP_NOFS|__GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 5/8] ext4: Do not fail journal due to block allocator
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
@@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 5/8] ext4: Do not fail journal due to block allocator
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
@@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 5/8] ext4: Do not fail journal due to block allocator
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..e6361622bfd5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	if (!page)
 		return -ENOMEM;
 	BUG_ON(page->mapping != inode->i_mapping);
@@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
@@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
 		if (page) {
 			BUG_ON(page->mapping != inode->i_mapping);
 			if (!PageUptodate(page)) {
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 6/8] ext3: Do not abort journal prematurely
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:

[   83.256914] journal_get_undo_access: No memory for committed data
[   83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_get_undo_access
[   83.259785] EXT3-fs (hdb1): error in ext3_free_blocks_sb: Out of
memory
[   83.267130] Aborting journal on device hdb1.
[   83.292308] EXT3-fs (hdb1): error: ext3_journal_start_sb: Detected
aborted journal
[   83.293630] EXT3-fs (hdb1): error: remounting filesystem read-only

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
these allocation requests are allowed to fail so we need to use
__GFP_NOFAIL to imitate the previous behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bf7474deda2f..6c60376a29bc 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -887,7 +887,7 @@ int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 6/8] ext3: Do not abort journal prematurely
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:

[   83.256914] journal_get_undo_access: No memory for committed data
[   83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_get_undo_access
[   83.259785] EXT3-fs (hdb1): error in ext3_free_blocks_sb: Out of
memory
[   83.267130] Aborting journal on device hdb1.
[   83.292308] EXT3-fs (hdb1): error: ext3_journal_start_sb: Detected
aborted journal
[   83.293630] EXT3-fs (hdb1): error: remounting filesystem read-only

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
these allocation requests are allowed to fail so we need to use
__GFP_NOFAIL to imitate the previous behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bf7474deda2f..6c60376a29bc 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -887,7 +887,7 @@ int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 6/8] ext3: Do not abort journal prematurely
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:

[   83.256914] journal_get_undo_access: No memory for committed data
[   83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_get_undo_access
[   83.259785] EXT3-fs (hdb1): error in ext3_free_blocks_sb: Out of
memory
[   83.267130] Aborting journal on device hdb1.
[   83.292308] EXT3-fs (hdb1): error: ext3_journal_start_sb: Detected
aborted journal
[   83.293630] EXT3-fs (hdb1): error: remounting filesystem read-only

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
these allocation requests are allowed to fail so we need to use
__GFP_NOFAIL to imitate the previous behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bf7474deda2f..6c60376a29bc 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -887,7 +887,7 @@ int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 
 repeat:
 	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
+		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL);
 		if (!committed_data) {
 			printk(KERN_ERR "%s: No memory for committed data\n",
 				__func__);
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 7/8] btrfs: Prevent from early transaction abort
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..88fad7051e38 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,7 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	if (eb == NULL)
 		return NULL;
 	eb->start = start;
@@ -4867,7 +4867,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p)
 			goto free_eb;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..88fad7051e38 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,7 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	if (eb == NULL)
 		return NULL;
 	eb->start = start;
@@ -4867,7 +4867,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p)
 			goto free_eb;
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..88fad7051e38 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,7 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	if (eb == NULL)
 		return NULL;
 	eb->start = start;
@@ -4867,7 +4867,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p)
 			goto free_eb;
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-05  9:51   ` mhocko
  -1 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045

This is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..57a99d19533d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,7 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 		 * and the stripes
 		 */
 		sizeof(u64) * (total_stripes),
-		GFP_NOFS);
+		GFP_NOFS|__GFP_NOFAIL);
 	if (!bbio)
 		return NULL;
 
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045

This is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..57a99d19533d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,7 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 		 * and the stripes
 		 */
 		sizeof(u64) * (total_stripes),
-		GFP_NOFS);
+		GFP_NOFS|__GFP_NOFAIL);
 	if (!bbio)
 		return NULL;
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
@ 2015-08-05  9:51   ` mhocko
  0 siblings, 0 replies; 82+ messages in thread
From: mhocko @ 2015-08-05  9:51 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045

This is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..57a99d19533d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,7 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 		 * and the stripes
 		 */
 		sizeof(u64) * (total_stripes),
-		GFP_NOFS);
+		GFP_NOFS|__GFP_NOFAIL);
 	if (!bbio)
 		return NULL;
 
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-05  9:51   ` mhocko
@ 2015-08-05 11:42     ` Jan Kara
  -1 siblings, 0 replies; 82+ messages in thread
From: Jan Kara @ 2015-08-05 11:42 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed 05-08-15 11:51:20, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Journal transaction might fail prematurely because the frozen_buffer
> is allocated by GFP_NOFS request:
> [   72.440013] do_get_write_access: OOM for frozen_buffer
> [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> (...snipped....)
> [   72.495559] do_get_write_access: OOM for frozen_buffer
> [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.496839] do_get_write_access: OOM for frozen_buffer
> [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.505766] Aborting journal on device sda1-8.
> [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
> 
> This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> allocations upon OOM" because small GPF_NOFS allocations never failed.
> This allocation seems essential for the journal and GFP_NOFS is too
> restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> emulate the previous behavior.
> 
> jbd code has the very same issue so let's do the same there as well.

The patch looks good. Btw, the patch 6 can be folded into this patch since
it fixes the issue you fix for jbd2 here... But jbd parts will be dropped
in the next merge window anyway so it doesn't really matter.

You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/jbd/transaction.c  | 11 +----------
>  fs/jbd2/transaction.c | 14 +++-----------
>  2 files changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 1695ba8334a2..bf7474deda2f 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index ff2f2e6ad311..bff071e21553 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd2_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
>  
>  repeat:
>  	if (!jh->b_committed_data) {
> -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> +					    GFP_NOFS|__GFP_NOFAIL);
>  		if (!committed_data) {
>  			printk(KERN_ERR "%s: No memory for committed data\n",
>  				__func__);
> -- 
> 2.5.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-05 11:42     ` Jan Kara
  0 siblings, 0 replies; 82+ messages in thread
From: Jan Kara @ 2015-08-05 11:42 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed 05-08-15 11:51:20, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Journal transaction might fail prematurely because the frozen_buffer
> is allocated by GFP_NOFS request:
> [   72.440013] do_get_write_access: OOM for frozen_buffer
> [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> (...snipped....)
> [   72.495559] do_get_write_access: OOM for frozen_buffer
> [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.496839] do_get_write_access: OOM for frozen_buffer
> [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.505766] Aborting journal on device sda1-8.
> [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
> 
> This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> allocations upon OOM" because small GPF_NOFS allocations never failed.
> This allocation seems essential for the journal and GFP_NOFS is too
> restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> emulate the previous behavior.
> 
> jbd code has the very same issue so let's do the same there as well.

The patch looks good. Btw, the patch 6 can be folded into this patch since
it fixes the issue you fix for jbd2 here... But jbd parts will be dropped
in the next merge window anyway so it doesn't really matter.

You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/jbd/transaction.c  | 11 +----------
>  fs/jbd2/transaction.c | 14 +++-----------
>  2 files changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 1695ba8334a2..bf7474deda2f 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index ff2f2e6ad311..bff071e21553 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd2_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
>  
>  repeat:
>  	if (!jh->b_committed_data) {
> -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> +					    GFP_NOFS|__GFP_NOFAIL);
>  		if (!committed_data) {
>  			printk(KERN_ERR "%s: No memory for committed data\n",
>  				__func__);
> -- 
> 2.5.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 5/8] ext4: Do not fail journal due to block allocator
  2015-08-05  9:51   ` mhocko
@ 2015-08-05 11:43     ` Jan Kara
  -1 siblings, 0 replies; 82+ messages in thread
From: Jan Kara @ 2015-08-05 11:43 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed 05-08-15 11:51:21, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> memory allocator doesn't endlessly loop to satisfy low-order allocations
> and instead fails them to allow callers to handle them gracefully.
> 
> Some of the callers are not yet prepared for this behavior though. ext4
> block allocator relies solely on GFP_NOFS allocation requests and
> allocation failures lead to aborting yournal too easily:
> 
> [  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
> [  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
> [  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
> [  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
> [  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
> [  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
> [  345.028343] Call Trace:
> [  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
> [  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
> [  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
> [  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
> [  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
> [  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
> [  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
> [  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
> [  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
> [  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
> [  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
> [  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
> [  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
> [  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
> [  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
> [  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
> [  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
> [  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
> [  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
> [  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
> [  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
> [...]
> [  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
> [  345.033097] Aborting journal on device hdb1-8.
> [  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
> [  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
> [  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
> [  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
> [  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
> [  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> 
> The failure is really premature because GFP_NOFS allocation context is
> very restricted - especially in the fs metadata heavy loads. Before we
> go with a more sofisticated solution, let's simply imitate the previous
> behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
> buddy block allocator. I wasn't able to trigger the issue with this
> patch anymore.
 
The patch looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/ext4/mballoc.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5b1613a54307..e6361622bfd5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  	block = group * 2;
>  	pnum = block / blocks_per_page;
>  	poff = block % blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);
>  	if (!page)
>  		return -ENOMEM;
>  	BUG_ON(page->mapping != inode->i_mapping);
> @@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  
>  	block++;
>  	pnum = block / blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);
>  	if (!page)
>  		return -ENOMEM;
>  	BUG_ON(page->mapping != inode->i_mapping);
> @@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
>  			 * wait for it to initialize.
>  			 */
>  			page_cache_release(page);
> -		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +		page = find_or_create_page(inode->i_mapping, pnum,
> +					   GFP_NOFS|__GFP_NOFAIL);
>  		if (page) {
>  			BUG_ON(page->mapping != inode->i_mapping);
>  			if (!PageUptodate(page)) {
> @@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
>  	if (page == NULL || !PageUptodate(page)) {
>  		if (page)
>  			page_cache_release(page);
> -		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +		page = find_or_create_page(inode->i_mapping, pnum,
> +					   GFP_NOFS|__GFP_NOFAIL);
>  		if (page) {
>  			BUG_ON(page->mapping != inode->i_mapping);
>  			if (!PageUptodate(page)) {
> -- 
> 2.5.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 5/8] ext4: Do not fail journal due to block allocator
@ 2015-08-05 11:43     ` Jan Kara
  0 siblings, 0 replies; 82+ messages in thread
From: Jan Kara @ 2015-08-05 11:43 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed 05-08-15 11:51:21, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> memory allocator doesn't endlessly loop to satisfy low-order allocations
> and instead fails them to allow callers to handle them gracefully.
> 
> Some of the callers are not yet prepared for this behavior though. ext4
> block allocator relies solely on GFP_NOFS allocation requests and
> allocation failures lead to aborting yournal too easily:
> 
> [  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
> [  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
> [  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
> [  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
> [  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
> [  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
> [  345.028343] Call Trace:
> [  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
> [  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
> [  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
> [  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
> [  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
> [  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
> [  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
> [  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
> [  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
> [  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
> [  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
> [  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
> [  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
> [  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
> [  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
> [  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
> [  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
> [  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
> [  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
> [  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
> [  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
> [...]
> [  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
> [  345.033097] Aborting journal on device hdb1-8.
> [  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
> [  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
> [  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
> [  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
> [  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> [  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
> [  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
> 
> The failure is really premature because GFP_NOFS allocation context is
> very restricted - especially in the fs metadata heavy loads. Before we
> go with a more sofisticated solution, let's simply imitate the previous
> behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
> buddy block allocator. I wasn't able to trigger the issue with this
> patch anymore.
 
The patch looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/ext4/mballoc.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 5b1613a54307..e6361622bfd5 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -992,7 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  	block = group * 2;
>  	pnum = block / blocks_per_page;
>  	poff = block % blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);
>  	if (!page)
>  		return -ENOMEM;
>  	BUG_ON(page->mapping != inode->i_mapping);
> @@ -1006,7 +1007,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  
>  	block++;
>  	pnum = block / blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);
>  	if (!page)
>  		return -ENOMEM;
>  	BUG_ON(page->mapping != inode->i_mapping);
> @@ -1158,7 +1160,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
>  			 * wait for it to initialize.
>  			 */
>  			page_cache_release(page);
> -		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +		page = find_or_create_page(inode->i_mapping, pnum,
> +					   GFP_NOFS|__GFP_NOFAIL);
>  		if (page) {
>  			BUG_ON(page->mapping != inode->i_mapping);
>  			if (!PageUptodate(page)) {
> @@ -1194,7 +1197,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
>  	if (page == NULL || !PageUptodate(page)) {
>  		if (page)
>  			page_cache_release(page);
> -		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> +		page = find_or_create_page(inode->i_mapping, pnum,
> +					   GFP_NOFS|__GFP_NOFAIL);
>  		if (page) {
>  			BUG_ON(page->mapping != inode->i_mapping);
>  			if (!PageUptodate(page)) {
> -- 
> 2.5.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-05  9:51   ` mhocko
  (?)
  (?)
@ 2015-08-05 12:28   ` Tetsuo Handa
  2015-08-05 14:02     ` Michal Hocko
  -1 siblings, 1 reply; 82+ messages in thread
From: Tetsuo Handa @ 2015-08-05 12:28 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, hannes, mhocko

Reduced to only linux-mm.

> From: Johannes Weiner <hannes@cmpxchg.org>
> 
> GFP_NOFS allocations are not allowed to invoke the OOM killer since
> their reclaim abilities are severely diminished.  However, without the
> OOM killer available there is no hope of progress once the reclaimable
> pages have been exhausted.

Excuse me, but I still cannot understand. Why are !__GFP_FS allocations
considered as "their reclaim abilities are severely diminished"?

It seems to me that not only GFP_NOFS allocation requests but also
almost all types of memory allocation requests do not include
__GFP_NO_KSWAPD flag.

Therefore, while a thread which called __alloc_pages_slowpath(GFP_NOFS)
cannot reclaim FS memory, I assume that kswapd kernel threads which are
woken up by the thread via wakeup_kswapd() via wake_all_kswapds() can
reclaim FS memory by calling balance_pgdat(). Is this assumption correct?

If the assumption is correct, when kswapd kernel threads returned from
balance_pgdat() or got stuck inside reclaiming functions (e.g. blocked at
mutex_lock() inside slab's shrinker functions), I think that the thread
which called __alloc_pages_slowpath(GFP_NOFS) has reclaimed FS memory
as if the thread called __alloc_pages_slowpath(GFP_KERNEL), and therefore
the thread qualifies calling out_of_memory() as with __GFP_FS allocations.

> 
> Don't risk hanging these allocations.  Leave it to the allocation site
> to implement the fallback policy for failing allocations.

Are there memory pages which kswapd kernel threads cannot reclaim
but __alloc_pages_slowpath(GFP_KERNEL) allocations can reclaim
when __alloc_pages_slowpath(GFP_NOFS) allocations are hanging?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-05 12:28   ` Tetsuo Handa
@ 2015-08-05 14:02     ` Michal Hocko
  2015-08-06 11:50       ` Tetsuo Handa
  0 siblings, 1 reply; 82+ messages in thread
From: Michal Hocko @ 2015-08-05 14:02 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, hannes

On Wed 05-08-15 21:28:39, Tetsuo Handa wrote:
> Reduced to only linux-mm.
> 
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > GFP_NOFS allocations are not allowed to invoke the OOM killer since
> > their reclaim abilities are severely diminished.  However, without the
> > OOM killer available there is no hope of progress once the reclaimable
> > pages have been exhausted.
> 
> Excuse me, but I still cannot understand. Why are !__GFP_FS allocations
> considered as "their reclaim abilities are severely diminished"?
> 
> It seems to me that not only GFP_NOFS allocation requests but also
> almost all types of memory allocation requests do not include
> __GFP_NO_KSWAPD flag.

__GFP_NO_KSWAPD is not to be used outside of very specific cases.

> Therefore, while a thread which called __alloc_pages_slowpath(GFP_NOFS)
> cannot reclaim FS memory, I assume that kswapd kernel threads which are
> woken up by the thread via wakeup_kswapd() via wake_all_kswapds() can
> reclaim FS memory by calling balance_pgdat(). Is this assumption correct?

yes.

> If the assumption is correct, when kswapd kernel threads returned from
> balance_pgdat() or got stuck inside reclaiming functions (e.g. blocked at
> mutex_lock() inside slab's shrinker functions), I think that the thread
> which called __alloc_pages_slowpath(GFP_NOFS) has reclaimed FS memory
> as if the thread called __alloc_pages_slowpath(GFP_KERNEL), and therefore
> the thread qualifies calling out_of_memory() as with __GFP_FS allocations.

You are missing an important point. We are talking about OOM situation
here. Which means that the background reclaim is not able to make
sufficient progress and neither is the direct reclaim. While the
GFP_IOFS requests are allowed to make a (V)FS activity which _might_
help GFP_NOFS is not by definition. And that is why this reclaim context
is less capable. Well to be more precise we do not perform IO (other
than the swapout) from the direct reclaim context because of the stack
restrictions so even GPF_IOFS is not _that_ strong but shrinkers are
still free to do metadata specific actions.
 
> > Don't risk hanging these allocations.  Leave it to the allocation site
> > to implement the fallback policy for failing allocations.
> 
> Are there memory pages which kswapd kernel threads cannot reclaim
> but __alloc_pages_slowpath(GFP_KERNEL) allocations can reclaim
> when __alloc_pages_slowpath(GFP_NOFS) allocations are hanging?

See above and have a look at the particular shrinkers code (e.g.
super_cache_scan).

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 7/8] btrfs: Prevent from early transaction abort
  2015-08-05  9:51   ` mhocko
@ 2015-08-05 16:31     ` David Sterba
  -1 siblings, 0 replies; 82+ messages in thread
From: David Sterba @ 2015-08-05 16:31 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed, Aug 05, 2015 at 11:51:23AM +0200, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
... 
> Fix this by reintroducing the no-fail behavior of this allocation path
> with the explicit __GFP_NOFAIL.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-05 16:31     ` David Sterba
  0 siblings, 0 replies; 82+ messages in thread
From: David Sterba @ 2015-08-05 16:31 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed, Aug 05, 2015 at 11:51:23AM +0200, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
... 
> Fix this by reintroducing the no-fail behavior of this allocation path
> with the explicit __GFP_NOFAIL.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
  2015-08-05  9:51   ` mhocko
@ 2015-08-05 16:32     ` David Sterba
  -1 siblings, 0 replies; 82+ messages in thread
From: David Sterba @ 2015-08-05 16:32 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed, Aug 05, 2015 at 11:51:24AM +0200, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
> page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
> allowed to fail which can lead to
> [   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045
> 
> This is clearly undesirable and the nofail behavior should be explicit
> if the allocation failure cannot be tolerated.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
@ 2015-08-05 16:32     ` David Sterba
  0 siblings, 0 replies; 82+ messages in thread
From: David Sterba @ 2015-08-05 16:32 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko

On Wed, Aug 05, 2015 at 11:51:24AM +0200, mhocko@kernel.org wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
> page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
> allowed to fail which can lead to
> [   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045
> 
> This is clearly undesirable and the nofail behavior should be explicit
> if the allocation failure cannot be tolerated.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-05  9:51   ` mhocko
@ 2015-08-05 16:49     ` Greg Thelen
  -1 siblings, 0 replies; 82+ messages in thread
From: Greg Thelen @ 2015-08-05 16:49 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko


mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
>
> Journal transaction might fail prematurely because the frozen_buffer
> is allocated by GFP_NOFS request:
> [   72.440013] do_get_write_access: OOM for frozen_buffer
> [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> (...snipped....)
> [   72.495559] do_get_write_access: OOM for frozen_buffer
> [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.496839] do_get_write_access: OOM for frozen_buffer
> [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.505766] Aborting journal on device sda1-8.
> [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
>
> This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> allocations upon OOM" because small GPF_NOFS allocations never failed.
> This allocation seems essential for the journal and GFP_NOFS is too
> restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> emulate the previous behavior.
>
> jbd code has the very same issue so let's do the same there as well.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/jbd/transaction.c  | 11 +----------
>  fs/jbd2/transaction.c | 14 +++-----------
>  2 files changed, 4 insertions(+), 21 deletions(-)
>
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 1695ba8334a2..bf7474deda2f 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index ff2f2e6ad311..bff071e21553 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd2_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
>  
>  repeat:
>  	if (!jh->b_committed_data) {
> -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> +					    GFP_NOFS|__GFP_NOFAIL);
>  		if (!committed_data) {
>  			printk(KERN_ERR "%s: No memory for committed data\n",
>  				__func__);

Is this "if (!committed_data) {" check now dead code?

I also see other similar suspected dead sites in the rest of the series.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-05 16:49     ` Greg Thelen
  0 siblings, 0 replies; 82+ messages in thread
From: Greg Thelen @ 2015-08-05 16:49 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara, Michal Hocko


mhocko@kernel.org wrote:

> From: Michal Hocko <mhocko@suse.com>
>
> Journal transaction might fail prematurely because the frozen_buffer
> is allocated by GFP_NOFS request:
> [   72.440013] do_get_write_access: OOM for frozen_buffer
> [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> (...snipped....)
> [   72.495559] do_get_write_access: OOM for frozen_buffer
> [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.496839] do_get_write_access: OOM for frozen_buffer
> [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.505766] Aborting journal on device sda1-8.
> [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
>
> This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> allocations upon OOM" because small GPF_NOFS allocations never failed.
> This allocation seems essential for the journal and GFP_NOFS is too
> restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> emulate the previous behavior.
>
> jbd code has the very same issue so let's do the same there as well.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/jbd/transaction.c  | 11 +----------
>  fs/jbd2/transaction.c | 14 +++-----------
>  2 files changed, 4 insertions(+), 21 deletions(-)
>
> diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> index 1695ba8334a2..bf7474deda2f 100644
> --- a/fs/jbd/transaction.c
> +++ b/fs/jbd/transaction.c
> @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index ff2f2e6ad311..bff071e21553 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  				jbd_unlock_bh_state(bh);
>  				frozen_buffer =
>  					jbd2_alloc(jh2bh(jh)->b_size,
> -							 GFP_NOFS);
> -				if (!frozen_buffer) {
> -					printk(KERN_ERR
> -					       "%s: OOM for frozen_buffer\n",
> -					       __func__);
> -					JBUFFER_TRACE(jh, "oom!");
> -					error = -ENOMEM;
> -					jbd_lock_bh_state(bh);
> -					goto done;
> -				}
> +							 GFP_NOFS|__GFP_NOFAIL);
>  				goto repeat;
>  			}
>  			jh->b_frozen_data = frozen_buffer;
> @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
>  
>  repeat:
>  	if (!jh->b_committed_data) {
> -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> +					    GFP_NOFS|__GFP_NOFAIL);
>  		if (!committed_data) {
>  			printk(KERN_ERR "%s: No memory for committed data\n",
>  				__func__);

Is this "if (!committed_data) {" check now dead code?

I also see other similar suspected dead sites in the rest of the series.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
  2015-08-05  9:51 ` mhocko
@ 2015-08-05 19:58   ` Andreas Dilger
  -1 siblings, 0 replies; 82+ messages in thread
From: Andreas Dilger @ 2015-08-05 19:58 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Aug 5, 2015, at 3:51 AM, mhocko@kernel.org wrote:
> Hi,
> small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
> traditionally even though their reclaim capabilities are restricted
> because the VM code cannot recurse into filesystems to clean dirty
> pages. At the same time these allocation requests do not allow to
> trigger the OOM killer because that would lead to pre-mature OOM killing
> during heavy fs metadata workloads.
> 
> This leaves the VM code in an unfortunate situation where GFP_NOFS
> requests is looping inside the allocator relying on somebody else to
> make a progress on its behalf. This is prone to deadlocks when the
> request is holding resources which are necessary for other task to make
> a progress and release memory (e.g. OOM victim is blocked on the lock
> held by the NONFS request). Another drawback is that the caller of
> the allocator cannot define any fallback strategy because the request
> doesn't fail.
> 
> As the VM cannot do much about these requests we should face the reality
> and allow those allocations to fail. Johannes has already posted the
> patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
> but the discussion died pretty quickly.
> 
> I was playing with this patch and xfs, ext[34] and btrfs for a while
> to see what is the effect under heavy memory pressure. As expected
> this led to some fallouts.
> 
> My test consisted of a simple memory hog which allocates a lot of
> anonymous memory and writes to a fs mainly to trigger a fs activity on
> exit. In parallel there is a parallel fs metadata load (multiple tasks
> creating thousands of empty files and directories). All is running
> in a VM with small amount of memory to emulate an under provisioned
> system. The metadata load is triggering a sufficient load to invoke
> the direct reclaim even without the memory hog. The memory hog forks
> several tasks sharing the VM and OOM killer manages to kill it without 
> locking up the system (this was based on the test case from Tetsuo
> Handa - http://www.spinics.net/lists/linux-fsdevel/msg82958.html -
> I just didn't want to kill my machine ;)).
> 
> With all the patches applied none of the 4 filesystems gets aborted
> transactions and RO remount (well xfs didn't need any special
> treatment). This is obviously not sufficient to claim that failing
> GFP_NOFS is OK now but I think it is a good start for the further
> discussion. I would be grateful if FS people could have a look at
> those patches.  I have simply used __GFP_NOFAIL in the critical paths. 
> This might be not the best strategy but it sounds like a good first
> step.
> 
> The first patch in the series also allows __GFP_NOFAIL allocations to
> access memory reserves when the system is OOM which should help those
> requests to make a forward progress - especially in combination with
> GFP_NOFS.
> 
> The second patch tries to address a potential pre-mature OOM killer
> from the page fault path. I have posted it separately but it didn't
> get much traction.
> 
> The third patch allows GFP_NOFS to fail and I believe it should see
> much more testing coverage. It would be really great if it could sit
> in the mmotm tree for few release cycles so that we can catch more
> fallouts.
> 
> The rest are the FS specific patches to fortify allocations
> requests which are really needed to finish transactions without RO
> remounts. There might be more needed but my test case survives with
> these in place.

Wouldn't it make more sense to order the fs-specific patches _before_
the "GFP_NOFS can fail" patch (#3), so that once that patch is applied
all known failures have already been fixed?  Otherwise it could show
test failures during bisection that would be confusing.

Cheers, Andreas

> They would obviously need some rewording if they are going to be
> applied even without Patch3 and I will do that if respective
> maintainers will take them. Ext3 and JBD are going away soon so they
> might be dropped but they have been in the tree while I was testing
> so I've kept them.
> 
> Thoughts? Opinions?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-05 19:58   ` Andreas Dilger
  0 siblings, 0 replies; 82+ messages in thread
From: Andreas Dilger @ 2015-08-05 19:58 UTC (permalink / raw)
  To: mhocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Aug 5, 2015, at 3:51 AM, mhocko@kernel.org wrote:
> Hi,
> small GFP_NOFS, like GFP_KERNEL, allocations have not been not failing
> traditionally even though their reclaim capabilities are restricted
> because the VM code cannot recurse into filesystems to clean dirty
> pages. At the same time these allocation requests do not allow to
> trigger the OOM killer because that would lead to pre-mature OOM killing
> during heavy fs metadata workloads.
> 
> This leaves the VM code in an unfortunate situation where GFP_NOFS
> requests is looping inside the allocator relying on somebody else to
> make a progress on its behalf. This is prone to deadlocks when the
> request is holding resources which are necessary for other task to make
> a progress and release memory (e.g. OOM victim is blocked on the lock
> held by the NONFS request). Another drawback is that the caller of
> the allocator cannot define any fallback strategy because the request
> doesn't fail.
> 
> As the VM cannot do much about these requests we should face the reality
> and allow those allocations to fail. Johannes has already posted the
> patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
> but the discussion died pretty quickly.
> 
> I was playing with this patch and xfs, ext[34] and btrfs for a while
> to see what is the effect under heavy memory pressure. As expected
> this led to some fallouts.
> 
> My test consisted of a simple memory hog which allocates a lot of
> anonymous memory and writes to a fs mainly to trigger a fs activity on
> exit. In parallel there is a parallel fs metadata load (multiple tasks
> creating thousands of empty files and directories). All is running
> in a VM with small amount of memory to emulate an under provisioned
> system. The metadata load is triggering a sufficient load to invoke
> the direct reclaim even without the memory hog. The memory hog forks
> several tasks sharing the VM and OOM killer manages to kill it without 
> locking up the system (this was based on the test case from Tetsuo
> Handa - http://www.spinics.net/lists/linux-fsdevel/msg82958.html -
> I just didn't want to kill my machine ;)).
> 
> With all the patches applied none of the 4 filesystems gets aborted
> transactions and RO remount (well xfs didn't need any special
> treatment). This is obviously not sufficient to claim that failing
> GFP_NOFS is OK now but I think it is a good start for the further
> discussion. I would be grateful if FS people could have a look at
> those patches.  I have simply used __GFP_NOFAIL in the critical paths. 
> This might be not the best strategy but it sounds like a good first
> step.
> 
> The first patch in the series also allows __GFP_NOFAIL allocations to
> access memory reserves when the system is OOM which should help those
> requests to make a forward progress - especially in combination with
> GFP_NOFS.
> 
> The second patch tries to address a potential pre-mature OOM killer
> from the page fault path. I have posted it separately but it didn't
> get much traction.
> 
> The third patch allows GFP_NOFS to fail and I believe it should see
> much more testing coverage. It would be really great if it could sit
> in the mmotm tree for few release cycles so that we can catch more
> fallouts.
> 
> The rest are the FS specific patches to fortify allocations
> requests which are really needed to finish transactions without RO
> remounts. There might be more needed but my test case survives with
> these in place.

Wouldn't it make more sense to order the fs-specific patches _before_
the "GFP_NOFS can fail" patch (#3), so that once that patch is applied
all known failures have already been fixed?  Otherwise it could show
test failures during bisection that would be confusing.

Cheers, Andreas

> They would obviously need some rewording if they are going to be
> applied even without Patch3 and I will do that if respective
> maintainers will take them. Ext3 and JBD are going away soon so they
> might be dropped but they have been in the tree while I was testing
> so I've kept them.
> 
> Thoughts? Opinions?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-05 14:02     ` Michal Hocko
@ 2015-08-06 11:50       ` Tetsuo Handa
  2015-08-12  9:11         ` Michal Hocko
  0 siblings, 1 reply; 82+ messages in thread
From: Tetsuo Handa @ 2015-08-06 11:50 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, hannes

Michal Hocko wrote:
> On Wed 05-08-15 21:28:39, Tetsuo Handa wrote:
> > Reduced to only linux-mm.
> > 
> > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > 
> > > GFP_NOFS allocations are not allowed to invoke the OOM killer since
> > > their reclaim abilities are severely diminished.  However, without the
> > > OOM killer available there is no hope of progress once the reclaimable
> > > pages have been exhausted.
> > 
> > Excuse me, but I still cannot understand. Why are !__GFP_FS allocations
> > considered as "their reclaim abilities are severely diminished"?
> > 
> > It seems to me that not only GFP_NOFS allocation requests but also
> > almost all types of memory allocation requests do not include
> > __GFP_NO_KSWAPD flag.
> 
> __GFP_NO_KSWAPD is not to be used outside of very specific cases.
> 
> > Therefore, while a thread which called __alloc_pages_slowpath(GFP_NOFS)
> > cannot reclaim FS memory, I assume that kswapd kernel threads which are
> > woken up by the thread via wakeup_kswapd() via wake_all_kswapds() can
> > reclaim FS memory by calling balance_pgdat(). Is this assumption correct?
> 
> yes.
> 
OK. Then, it sounds to me that

  GFP_NOFS allocations' reclaim abilities are severely diminished as of
  reaching __alloc_pages_may_oom() for the first time of their allocation.
  But as time goes by, kswapd which has full reclaim abilities will reclaim
  memory which GFP_NOFS cannot reclaim. Thus, GFP_NOFS allocations' reclaim
  abilities is nearly equals to GFP_KERNEL if they waited for enough time.
  Therefore, GFP_NOFS allocations are allowed to invoke the OOM killer
  if they waited for enough time.

and the problem is that we don't have a trigger to teach that "You have
waited for enough duration but memory is still tight. Therefore, you can
invoke the OOM killer."

> > If the assumption is correct, when kswapd kernel threads returned from
> > balance_pgdat() or got stuck inside reclaiming functions (e.g. blocked at
> > mutex_lock() inside slab's shrinker functions), I think that the thread
> > which called __alloc_pages_slowpath(GFP_NOFS) has reclaimed FS memory
> > as if the thread called __alloc_pages_slowpath(GFP_KERNEL), and therefore
> > the thread qualifies calling out_of_memory() as with __GFP_FS allocations.
> 
> You are missing an important point. We are talking about OOM situation
> here. Which means that the background reclaim is not able to make
> sufficient progress and neither is the direct reclaim.

My worry here is about nearly OOM situation.

Generally, __GFP_WAIT allocations are more likely to succeed than
!__GFP_WAIT allocations. Therefore, GFP_ATOMIC allocations include
__GFP_HIGH in order to pass __zone_watermark_ok() when !__GFP_HIGH
allocations fail.

GFP_NOFS allocations include __GFP_WAIT but does not include __GFP_HIGH.
GFP_NOFS allocations will fail __zone_watermark_ok() when GFP_ATOMIC
allocations will pass. Thus, GFP_NOFS allocations retrying forever unless
TIF_MEMDIE is set is the toehold of likeliness of succeeding memory
allocation (except for the deadlock problem).

This patch changes !__GFP_FS allocations not to retry unless __GFP_NOFAIL is
set. I worry that we are going to make !__GFP_FS allocations less reliable
than GFP_ATOMIC allocations because the former is "close to !__GFP_WAIT" and
!__GFP_HIGH whereas the latter is "indeed !__GFP_WAIT" and __GFP_HIGH.

Therefore, I worry that, under nearly OOM condition where waiting for kswapd
kernel threads for a few seconds will reclaim FS memory which will be enough
to succeed the !__GFP_FS allocations, GFP_NOFS allocations start failing
prematurely. The toehold (reliability by __GFP_WAIT) is almost gone.

Therefore, I'm tempted to add __GFP_NOFAIL to GFP_NOFS/GFP_NOIO allocations.
If __GFP_NOFAIL is added, they will start calling out_of_memory() even under
nearly OOM condition where waiting for kswapd kernel threads for a few seconds
will reclaim memory which will be enough to succeed the GFP_NOFS/GFP_NOIO
allocations. The bad end is that out_of_memory() is called needlessly/frequently
than now, and I worry that OOM deadlock problem or depletion of memory reserves
occurs more likely than now due to a lot of __GFP_NOFAIL allocations.

Maybe, I'm tempted to replace GFP_NOFS/GFP_NOIO allocations with GFP_ATOMIC
allocations ( http://marc.info/?l=linux-xfs&m=142520873721204&w=2 ).

>                                                        While the
> GFP_IOFS requests are allowed to make a (V)FS activity which _might_
> help GFP_NOFS is not by definition. And that is why this reclaim context
> is less capable. Well to be more precise we do not perform IO (other
> than the swapout) from the direct reclaim context because of the stack
> restrictions so even GPF_IOFS is not _that_ strong but shrinkers are
> still free to do metadata specific actions.
>  
> > > Don't risk hanging these allocations.  Leave it to the allocation site
> > > to implement the fallback policy for failing allocations.
> > 
> > Are there memory pages which kswapd kernel threads cannot reclaim
> > but __alloc_pages_slowpath(GFP_KERNEL) allocations can reclaim
> > when __alloc_pages_slowpath(GFP_NOFS) allocations are hanging?
> 
> See above and have a look at the particular shrinkers code (e.g.
> super_cache_scan).

super_cache_scan() checks for __GFP_FS upon entry. If kswapd kernel threads
can call super_cache_scan() with GFP_KERNEL context, kswapd kernel threads
can reclaim. Thus, the answer to this question is "no" because I assume that
kswapd kernel threads can call super_cache_scan() with GFP_KERNEL context.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
  2015-08-05  9:51 ` mhocko
  (?)
@ 2015-08-06 14:34     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-06 14:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Johannes Weiner, Dave Chinner, Tetsuo Handa, linux-mm,
	Andrew Morton, Theodore Ts'o, Jan Kara, linux-btrfs,
	linux-ext4, linux-fsdevel, LKML

On Wed 05-08-15 20:58:25, Andreas Dilger wrote:
> On Aug 5, 2015, at 3:51 AM, mhocko@kernel.org wrote:
[...]
> > The rest are the FS specific patches to fortify allocations
> > requests which are really needed to finish transactions without RO
> > remounts. There might be more needed but my test case survives with
> > these in place.
> 
> Wouldn't it make more sense to order the fs-specific patches _before_
> the "GFP_NOFS can fail" patch (#3), so that once that patch is applied
> all known failures have already been fixed?  Otherwise it could show
> test failures during bisection that would be confusing.

As I write below. If maintainers consider them useful even when GFP_NOFS
doesn't fail I will reword them and resend. But you cannot fix the world
without breaking it first in this case ;)
 
> > They would obviously need some rewording if they are going to be
> > applied even without Patch3 and I will do that if respective
> > maintainers will take them. Ext3 and JBD are going away soon so they
> > might be dropped but they have been in the tree while I was testing
> > so I've kept them.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-06 14:34     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-06 14:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Johannes Weiner, Dave Chinner, Tetsuo Handa, linux-mm,
	Andrew Morton, Theodore Ts'o, Jan Kara, linux-btrfs,
	linux-ext4, linux-fsdevel, LKML

On Wed 05-08-15 20:58:25, Andreas Dilger wrote:
> On Aug 5, 2015, at 3:51 AM, mhocko@kernel.org wrote:
[...]
> > The rest are the FS specific patches to fortify allocations
> > requests which are really needed to finish transactions without RO
> > remounts. There might be more needed but my test case survives with
> > these in place.
> 
> Wouldn't it make more sense to order the fs-specific patches _before_
> the "GFP_NOFS can fail" patch (#3), so that once that patch is applied
> all known failures have already been fixed?  Otherwise it could show
> test failures during bisection that would be confusing.

As I write below. If maintainers consider them useful even when GFP_NOFS
doesn't fail I will reword them and resend. But you cannot fix the world
without breaking it first in this case ;)
 
> > They would obviously need some rewording if they are going to be
> > applied even without Patch3 and I will do that if respective
> > maintainers will take them. Ext3 and JBD are going away soon so they
> > might be dropped but they have been in the tree while I was testing
> > so I've kept them.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-08-06 14:34     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-06 14:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Johannes Weiner, Dave Chinner, Tetsuo Handa, linux-mm,
	Andrew Morton, Theodore Ts'o, Jan Kara, linux-btrfs,
	linux-ext4, linux-fsdevel, LKML

On Wed 05-08-15 20:58:25, Andreas Dilger wrote:
> On Aug 5, 2015, at 3:51 AM, mhocko@kernel.org wrote:
[...]
> > The rest are the FS specific patches to fortify allocations
> > requests which are really needed to finish transactions without RO
> > remounts. There might be more needed but my test case survives with
> > these in place.
> 
> Wouldn't it make more sense to order the fs-specific patches _before_
> the "GFP_NOFS can fail" patch (#3), so that once that patch is applied
> all known failures have already been fixed?  Otherwise it could show
> test failures during bisection that would be confusing.

As I write below. If maintainers consider them useful even when GFP_NOFS
doesn't fail I will reword them and resend. But you cannot fix the world
without breaking it first in this case ;)
 
> > They would obviously need some rewording if they are going to be
> > applied even without Patch3 and I will do that if respective
> > maintainers will take them. Ext3 and JBD are going away soon so they
> > might be dropped but they have been in the tree while I was testing
> > so I've kept them.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-06 11:50       ` Tetsuo Handa
@ 2015-08-12  9:11         ` Michal Hocko
  2015-08-16 14:04           ` Tetsuo Handa
  0 siblings, 1 reply; 82+ messages in thread
From: Michal Hocko @ 2015-08-12  9:11 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, hannes

On Thu 06-08-15 20:50:27, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 05-08-15 21:28:39, Tetsuo Handa wrote:
> > > Reduced to only linux-mm.
> > > 
> > > > From: Johannes Weiner <hannes@cmpxchg.org>
> > > > 
> > > > GFP_NOFS allocations are not allowed to invoke the OOM killer since
> > > > their reclaim abilities are severely diminished.  However, without the
> > > > OOM killer available there is no hope of progress once the reclaimable
> > > > pages have been exhausted.
> > > 
> > > Excuse me, but I still cannot understand. Why are !__GFP_FS allocations
> > > considered as "their reclaim abilities are severely diminished"?
> > > 
> > > It seems to me that not only GFP_NOFS allocation requests but also
> > > almost all types of memory allocation requests do not include
> > > __GFP_NO_KSWAPD flag.
> > 
> > __GFP_NO_KSWAPD is not to be used outside of very specific cases.
> > 
> > > Therefore, while a thread which called __alloc_pages_slowpath(GFP_NOFS)
> > > cannot reclaim FS memory, I assume that kswapd kernel threads which are
> > > woken up by the thread via wakeup_kswapd() via wake_all_kswapds() can
> > > reclaim FS memory by calling balance_pgdat(). Is this assumption correct?
> > 
> > yes.
> > 
> OK. Then, it sounds to me that
> 
>   GFP_NOFS allocations' reclaim abilities are severely diminished as of
>   reaching __alloc_pages_may_oom() for the first time of their allocation.
>   But as time goes by, kswapd which has full reclaim abilities will reclaim
>   memory which GFP_NOFS cannot reclaim. Thus, GFP_NOFS allocations' reclaim
>   abilities is nearly equals to GFP_KERNEL if they waited for enough time.
>   Therefore, GFP_NOFS allocations are allowed to invoke the OOM killer
>   if they waited for enough time.
> 
> and the problem is that we don't have a trigger to teach that "You have
> waited for enough duration but memory is still tight. Therefore, you can
> invoke the OOM killer."

No the problem is that we do not know whether a GFP_IOFS request would
be able to make a progress in the same context. If we knew this we
could trigger the OOM killer because the full reclaim wouldn't make any
progress anyway.

> > > If the assumption is correct, when kswapd kernel threads returned from
> > > balance_pgdat() or got stuck inside reclaiming functions (e.g. blocked at
> > > mutex_lock() inside slab's shrinker functions), I think that the thread
> > > which called __alloc_pages_slowpath(GFP_NOFS) has reclaimed FS memory
> > > as if the thread called __alloc_pages_slowpath(GFP_KERNEL), and therefore
> > > the thread qualifies calling out_of_memory() as with __GFP_FS allocations.
> > 
> > You are missing an important point. We are talking about OOM situation
> > here. Which means that the background reclaim is not able to make
> > sufficient progress and neither is the direct reclaim.
> 
> My worry here is about nearly OOM situation.
> 
> Generally, __GFP_WAIT allocations are more likely to succeed than
> !__GFP_WAIT allocations. Therefore, GFP_ATOMIC allocations include
> __GFP_HIGH in order to pass __zone_watermark_ok() when !__GFP_HIGH
> allocations fail.
> 
> GFP_NOFS allocations include __GFP_WAIT but does not include __GFP_HIGH.
> GFP_NOFS allocations will fail __zone_watermark_ok() when GFP_ATOMIC
> allocations will pass. Thus, GFP_NOFS allocations retrying forever unless
> TIF_MEMDIE is set is the toehold of likeliness of succeeding memory
> allocation (except for the deadlock problem).
> 
> This patch changes !__GFP_FS allocations not to retry unless __GFP_NOFAIL is
> set. I worry that we are going to make !__GFP_FS allocations less reliable
> than GFP_ATOMIC allocations because the former is "close to !__GFP_WAIT" and
> !__GFP_HIGH whereas the latter is "indeed !__GFP_WAIT" and __GFP_HIGH.

I am sorry but this doesn't make much sense to me.

> Therefore, I worry that, under nearly OOM condition where waiting for kswapd
> kernel threads for a few seconds will reclaim FS memory which will be enough
> to succeed the !__GFP_FS allocations, GFP_NOFS allocations start failing
> prematurely. The toehold (reliability by __GFP_WAIT) is almost gone.

GFP_NOFS had to go through the full reclaim process to end up in the oom
path. All that without making _any_ progress. kswapd should be running
in the background so talking about waiting for few seconds doesn't solve
much once we have hit the oom path. You can be lucky under some very
specific conditions but in general we _are_ OOM.

> Therefore, I'm tempted to add __GFP_NOFAIL to GFP_NOFS/GFP_NOIO allocations.

No, __GFP_NOFAIL is a strong requirement and should be used only when
the allocation failure is really not acceptable.

> If __GFP_NOFAIL is added, they will start calling out_of_memory() even under
> nearly OOM condition where waiting for kswapd kernel threads for a few seconds
> will reclaim memory which will be enough to succeed the GFP_NOFS/GFP_NOIO
> allocations. The bad end is that out_of_memory() is called needlessly/frequently
> than now, and I worry that OOM deadlock problem or depletion of memory reserves
> occurs more likely than now due to a lot of __GFP_NOFAIL allocations.
> 
> Maybe, I'm tempted to replace GFP_NOFS/GFP_NOIO allocations with GFP_ATOMIC
> allocations ( http://marc.info/?l=linux-xfs&m=142520873721204&w=2 ).

This doesn't make any sense. GPF_NOFS can sleep so there is no reason to
make them NOWAIT. If some of those allocations benefit from memory
reserves because they would free more memory in return then they are
free to add __GFP_HIGH.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-05 16:49     ` Greg Thelen
@ 2015-08-12  9:14       ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-12  9:14 UTC (permalink / raw)
  To: Greg Thelen
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Wed 05-08-15 09:49:24, Greg Thelen wrote:
> 
> mhocko@kernel.org wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Journal transaction might fail prematurely because the frozen_buffer
> > is allocated by GFP_NOFS request:
> > [   72.440013] do_get_write_access: OOM for frozen_buffer
> > [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> > (...snipped....)
> > [   72.495559] do_get_write_access: OOM for frozen_buffer
> > [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.496839] do_get_write_access: OOM for frozen_buffer
> > [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.505766] Aborting journal on device sda1-8.
> > [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
> >
> > This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> > allocations upon OOM" because small GPF_NOFS allocations never failed.
> > This allocation seems essential for the journal and GFP_NOFS is too
> > restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> > emulate the previous behavior.
> >
> > jbd code has the very same issue so let's do the same there as well.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/jbd/transaction.c  | 11 +----------
> >  fs/jbd2/transaction.c | 14 +++-----------
> >  2 files changed, 4 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> > index 1695ba8334a2..bf7474deda2f 100644
> > --- a/fs/jbd/transaction.c
> > +++ b/fs/jbd/transaction.c
> > @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
> >  				jbd_unlock_bh_state(bh);
> >  				frozen_buffer =
> >  					jbd_alloc(jh2bh(jh)->b_size,
> > -							 GFP_NOFS);
> > -				if (!frozen_buffer) {
> > -					printk(KERN_ERR
> > -					       "%s: OOM for frozen_buffer\n",
> > -					       __func__);
> > -					JBUFFER_TRACE(jh, "oom!");
> > -					error = -ENOMEM;
> > -					jbd_lock_bh_state(bh);
> > -					goto done;
> > -				}
> > +							 GFP_NOFS|__GFP_NOFAIL);
> >  				goto repeat;
> >  			}
> >  			jh->b_frozen_data = frozen_buffer;
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index ff2f2e6ad311..bff071e21553 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
> >  				jbd_unlock_bh_state(bh);
> >  				frozen_buffer =
> >  					jbd2_alloc(jh2bh(jh)->b_size,
> > -							 GFP_NOFS);
> > -				if (!frozen_buffer) {
> > -					printk(KERN_ERR
> > -					       "%s: OOM for frozen_buffer\n",
> > -					       __func__);
> > -					JBUFFER_TRACE(jh, "oom!");
> > -					error = -ENOMEM;
> > -					jbd_lock_bh_state(bh);
> > -					goto done;
> > -				}
> > +							 GFP_NOFS|__GFP_NOFAIL);
> >  				goto repeat;
> >  			}
> >  			jh->b_frozen_data = frozen_buffer;
> > @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
> >  
> >  repeat:
> >  	if (!jh->b_committed_data) {
> > -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> > +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> > +					    GFP_NOFS|__GFP_NOFAIL);
> >  		if (!committed_data) {
> >  			printk(KERN_ERR "%s: No memory for committed data\n",
> >  				__func__);
> 
> Is this "if (!committed_data) {" check now dead code?
> 
> I also see other similar suspected dead sites in the rest of the series.

You are absolutely right. I have updated the patches.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-12  9:14       ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-12  9:14 UTC (permalink / raw)
  To: Greg Thelen
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Wed 05-08-15 09:49:24, Greg Thelen wrote:
> 
> mhocko@kernel.org wrote:
> 
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Journal transaction might fail prematurely because the frozen_buffer
> > is allocated by GFP_NOFS request:
> > [   72.440013] do_get_write_access: OOM for frozen_buffer
> > [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> > (...snipped....)
> > [   72.495559] do_get_write_access: OOM for frozen_buffer
> > [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.496839] do_get_write_access: OOM for frozen_buffer
> > [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> > [   72.505766] Aborting journal on device sda1-8.
> > [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
> >
> > This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> > allocations upon OOM" because small GPF_NOFS allocations never failed.
> > This allocation seems essential for the journal and GFP_NOFS is too
> > restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> > emulate the previous behavior.
> >
> > jbd code has the very same issue so let's do the same there as well.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/jbd/transaction.c  | 11 +----------
> >  fs/jbd2/transaction.c | 14 +++-----------
> >  2 files changed, 4 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
> > index 1695ba8334a2..bf7474deda2f 100644
> > --- a/fs/jbd/transaction.c
> > +++ b/fs/jbd/transaction.c
> > @@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
> >  				jbd_unlock_bh_state(bh);
> >  				frozen_buffer =
> >  					jbd_alloc(jh2bh(jh)->b_size,
> > -							 GFP_NOFS);
> > -				if (!frozen_buffer) {
> > -					printk(KERN_ERR
> > -					       "%s: OOM for frozen_buffer\n",
> > -					       __func__);
> > -					JBUFFER_TRACE(jh, "oom!");
> > -					error = -ENOMEM;
> > -					jbd_lock_bh_state(bh);
> > -					goto done;
> > -				}
> > +							 GFP_NOFS|__GFP_NOFAIL);
> >  				goto repeat;
> >  			}
> >  			jh->b_frozen_data = frozen_buffer;
> > diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> > index ff2f2e6ad311..bff071e21553 100644
> > --- a/fs/jbd2/transaction.c
> > +++ b/fs/jbd2/transaction.c
> > @@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
> >  				jbd_unlock_bh_state(bh);
> >  				frozen_buffer =
> >  					jbd2_alloc(jh2bh(jh)->b_size,
> > -							 GFP_NOFS);
> > -				if (!frozen_buffer) {
> > -					printk(KERN_ERR
> > -					       "%s: OOM for frozen_buffer\n",
> > -					       __func__);
> > -					JBUFFER_TRACE(jh, "oom!");
> > -					error = -ENOMEM;
> > -					jbd_lock_bh_state(bh);
> > -					goto done;
> > -				}
> > +							 GFP_NOFS|__GFP_NOFAIL);
> >  				goto repeat;
> >  			}
> >  			jh->b_frozen_data = frozen_buffer;
> > @@ -1157,7 +1148,8 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
> >  
> >  repeat:
> >  	if (!jh->b_committed_data) {
> > -		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
> > +		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
> > +					    GFP_NOFS|__GFP_NOFAIL);
> >  		if (!committed_data) {
> >  			printk(KERN_ERR "%s: No memory for committed data\n",
> >  				__func__);
> 
> Is this "if (!committed_data) {" check now dead code?
> 
> I also see other similar suspected dead sites in the rest of the series.

You are absolutely right. I have updated the patches.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-12  9:14       ` Michal Hocko
@ 2015-08-15 13:54         ` Theodore Ts'o
  -1 siblings, 0 replies; 82+ messages in thread
From: Theodore Ts'o @ 2015-08-15 13:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > Is this "if (!committed_data) {" check now dead code?
> > 
> > I also see other similar suspected dead sites in the rest of the series.
> 
> You are absolutely right. I have updated the patches.

Have you sent out an updated version of these patches?  Maybe I missed
it, but I don't think I saw them.

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-15 13:54         ` Theodore Ts'o
  0 siblings, 0 replies; 82+ messages in thread
From: Theodore Ts'o @ 2015-08-15 13:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > Is this "if (!committed_data) {" check now dead code?
> > 
> > I also see other similar suspected dead sites in the rest of the series.
> 
> You are absolutely right. I have updated the patches.

Have you sent out an updated version of these patches?  Maybe I missed
it, but I don't think I saw them.

Thanks,

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM
  2015-08-12  9:11         ` Michal Hocko
@ 2015-08-16 14:04           ` Tetsuo Handa
  0 siblings, 0 replies; 82+ messages in thread
From: Tetsuo Handa @ 2015-08-16 14:04 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, hannes

Michal Hocko wrote:
> > Therefore, I worry that, under nearly OOM condition where waiting for kswapd
> > kernel threads for a few seconds will reclaim FS memory which will be enough
> > to succeed the !__GFP_FS allocations, GFP_NOFS allocations start failing
> > prematurely. The toehold (reliability by __GFP_WAIT) is almost gone.
> 
> GFP_NOFS had to go through the full reclaim process to end up in the oom
> path. All that without making _any_ progress. kswapd should be running
> in the background so talking about waiting for few seconds doesn't solve
> much once we have hit the oom path. You can be lucky under some very
> specific conditions but in general we _are_ OOM.

As a GFP_NOFS user from syscalls than filesystem's writebacks (some of LSM
hooks are called with fs locks held), I'm happy to give up upon SIGKILL but
I'm not happy to return -ENOMEM without retrying hard. Returning -ENOMEM to
user space is nearly equals to terminating that process because what user
space programs likely do upon unexpected -ENOMEM is to call exit(). Therefore,
I prefer OOM killing some memory hog process than potentially terminating
important processes which can be controlled via /proc/pid/oom_score_adj .

As a troubleshooting staff, I wish that we have a mechanism for proving that
the cause of silent hang up (hangups without the OOM killer messages) are not
caused by mm subsystem's behavior. How can we prove if memory allocation
requests stuck before reaching the oom path (e.g. inside shrinker functions
or shrink_inactive_list())? I want to use something like khungtaskd.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-15 13:54         ` Theodore Ts'o
@ 2015-08-18 10:36           ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

On Sat 15-08-15 09:54:22, Theodore Ts'o wrote:
> On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > > Is this "if (!committed_data) {" check now dead code?
> > > 
> > > I also see other similar suspected dead sites in the rest of the series.
> > 
> > You are absolutely right. I have updated the patches.
> 
> Have you sent out an updated version of these patches?  Maybe I missed
> it, but I don't think I saw them.

I haven't yet. I was waiting for more feedback and didn't want to spam
the mailing list too much. I will post them now.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-18 10:36           ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:36 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

On Sat 15-08-15 09:54:22, Theodore Ts'o wrote:
> On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > > Is this "if (!committed_data) {" check now dead code?
> > > 
> > > I also see other similar suspected dead sites in the rest of the series.
> > 
> > You are absolutely right. I have updated the patches.
> 
> Have you sent out an updated version of these patches?  Maybe I missed
> it, but I don't think I saw them.

I haven't yet. I was waiting for more feedback and didn't want to spam
the mailing list too much. I will post them now.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RFC -v2 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-05  9:51   ` mhocko
@ 2015-08-18 10:38     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:38 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.

jbd code has the very same issue so let's do the same there as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c  | 11 +----------
 fs/jbd2/transaction.c | 23 ++++-------------------
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 1695ba8334a2..bf7474deda2f 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ff2f2e6ad311..4d63c5911afa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd2_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
@@ -1156,15 +1147,9 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 		goto out;
 
 repeat:
-	if (!jh->b_committed_data) {
-		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
-		if (!committed_data) {
-			printk(KERN_ERR "%s: No memory for committed data\n",
-				__func__);
-			err = -ENOMEM;
-			goto out;
-		}
-	}
+	if (!jh->b_committed_data)
+		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
+					    GFP_NOFS|__GFP_NOFAIL);
 
 	jbd_lock_bh_state(bh);
 	if (!jh->b_committed_data) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-18 10:38     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:38 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.

jbd code has the very same issue so let's do the same there as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c  | 11 +----------
 fs/jbd2/transaction.c | 23 ++++-------------------
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 1695ba8334a2..bf7474deda2f 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -673,16 +673,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index ff2f2e6ad311..4d63c5911afa 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -923,16 +923,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 				jbd_unlock_bh_state(bh);
 				frozen_buffer =
 					jbd2_alloc(jh2bh(jh)->b_size,
-							 GFP_NOFS);
-				if (!frozen_buffer) {
-					printk(KERN_ERR
-					       "%s: OOM for frozen_buffer\n",
-					       __func__);
-					JBUFFER_TRACE(jh, "oom!");
-					error = -ENOMEM;
-					jbd_lock_bh_state(bh);
-					goto done;
-				}
+							 GFP_NOFS|__GFP_NOFAIL);
 				goto repeat;
 			}
 			jh->b_frozen_data = frozen_buffer;
@@ -1156,15 +1147,9 @@ int jbd2_journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 		goto out;
 
 repeat:
-	if (!jh->b_committed_data) {
-		committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS);
-		if (!committed_data) {
-			printk(KERN_ERR "%s: No memory for committed data\n",
-				__func__);
-			err = -ENOMEM;
-			goto out;
-		}
-	}
+	if (!jh->b_committed_data)
+		committed_data = jbd2_alloc(jh2bh(jh)->b_size,
+					    GFP_NOFS|__GFP_NOFAIL);
 
 	jbd_lock_bh_state(bh);
 	if (!jh->b_committed_data) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 5/8] ext4: Do not fail journal due to block allocator
  2015-08-05  9:51   ` mhocko
@ 2015-08-18 10:39     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:39 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 52 ++++++++++++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..0360ea32c30f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,9 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-	if (!page)
-		return -ENOMEM;
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	BUG_ON(page->mapping != inode->i_mapping);
 	e4b->bd_bitmap_page = page;
 	e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize);
@@ -1006,9 +1005,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-	if (!page)
-		return -ENOMEM;
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	BUG_ON(page->mapping != inode->i_mapping);
 	e4b->bd_buddy_page = page;
 	return 0;
@@ -1158,20 +1156,19 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-		if (page) {
-			BUG_ON(page->mapping != inode->i_mapping);
-			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, NULL);
-				if (ret) {
-					unlock_page(page);
-					goto err;
-				}
-				mb_cmp_bitmaps(e4b, page_address(page) +
-					       (poff * sb->s_blocksize));
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
+		BUG_ON(page->mapping != inode->i_mapping);
+		if (!PageUptodate(page)) {
+			ret = ext4_mb_init_cache(page, NULL);
+			if (ret) {
+				unlock_page(page);
+				goto err;
 			}
-			unlock_page(page);
+			mb_cmp_bitmaps(e4b, page_address(page) +
+				       (poff * sb->s_blocksize));
 		}
+		unlock_page(page);
 	}
 	if (page == NULL) {
 		ret = -ENOMEM;
@@ -1194,18 +1191,17 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-		if (page) {
-			BUG_ON(page->mapping != inode->i_mapping);
-			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
-				if (ret) {
-					unlock_page(page);
-					goto err;
-				}
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
+		BUG_ON(page->mapping != inode->i_mapping);
+		if (!PageUptodate(page)) {
+			ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
+			if (ret) {
+				unlock_page(page);
+				goto err;
 			}
-			unlock_page(page);
 		}
+		unlock_page(page);
 	}
 	if (page == NULL) {
 		ret = -ENOMEM;
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 5/8] ext4: Do not fail journal due to block allocator
@ 2015-08-18 10:39     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:39 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
memory allocator doesn't endlessly loop to satisfy low-order allocations
and instead fails them to allow callers to handle them gracefully.

Some of the callers are not yet prepared for this behavior though. ext4
block allocator relies solely on GFP_NOFS allocation requests and
allocation failures lead to aborting yournal too easily:

[  345.028333] oom-trash: page allocation failure: order:0, mode:0x50
[  345.028336] CPU: 1 PID: 8334 Comm: oom-trash Tainted: G        W       4.0.0-nofs3-00006-gdfe9931f5f68 #588
[  345.028337] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150428_134905-gandalf 04/01/2014
[  345.028339]  0000000000000000 ffff880005a17708 ffffffff81538a54 ffffffff8107a40f
[  345.028341]  0000000000000050 ffff880005a17798 ffffffff810fe854 0000000180000000
[  345.028342]  0000000000000046 0000000000000000 ffffffff81a52100 0000000000000246
[  345.028343] Call Trace:
[  345.028348]  [<ffffffff81538a54>] dump_stack+0x4f/0x7b
[  345.028370]  [<ffffffff810fe854>] warn_alloc_failed+0x12a/0x13f
[  345.028373]  [<ffffffff81101bd2>] __alloc_pages_nodemask+0x7f3/0x8aa
[  345.028375]  [<ffffffff810f9933>] pagecache_get_page+0x12a/0x1c9
[  345.028390]  [<ffffffffa005bc64>] ext4_mb_load_buddy+0x220/0x367 [ext4]
[  345.028414]  [<ffffffffa006014f>] ext4_free_blocks+0x522/0xa4c [ext4]
[  345.028425]  [<ffffffffa0054e14>] ext4_ext_remove_space+0x833/0xf22 [ext4]
[  345.028434]  [<ffffffffa005677e>] ext4_ext_truncate+0x8c/0xb0 [ext4]
[  345.028441]  [<ffffffffa00342bf>] ext4_truncate+0x20b/0x38d [ext4]
[  345.028462]  [<ffffffffa003573c>] ext4_evict_inode+0x32b/0x4c1 [ext4]
[  345.028464]  [<ffffffff8116d04f>] evict+0xa0/0x148
[  345.028466]  [<ffffffff8116dca8>] iput+0x1a1/0x1f0
[  345.028468]  [<ffffffff811697b4>] __dentry_kill+0x136/0x1a6
[  345.028470]  [<ffffffff81169a3e>] dput+0x21a/0x243
[  345.028472]  [<ffffffff81157cda>] __fput+0x184/0x19b
[  345.028473]  [<ffffffff81157d29>] ____fput+0xe/0x10
[  345.028475]  [<ffffffff8105a05f>] task_work_run+0x8a/0xa1
[  345.028477]  [<ffffffff810452f0>] do_exit+0x3c6/0x8dc
[  345.028482]  [<ffffffff8104588a>] do_group_exit+0x4d/0xb2
[  345.028483]  [<ffffffff8104eeeb>] get_signal+0x5b1/0x5f5
[  345.028488]  [<ffffffff81002202>] do_signal+0x28/0x5d0
[...]
[  345.028624] EXT4-fs error (device hdb1) in ext4_free_blocks:4879: Out of memory
[  345.033097] Aborting journal on device hdb1-8.
[  345.036339] EXT4-fs (hdb1): Remounting filesystem read-only
[  345.036344] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.036766] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.038583] EXT4-fs error (device hdb1) in ext4_ext_remove_space:3048: Journal has aborted
[  345.049115] EXT4-fs error (device hdb1) in ext4_ext_truncate:4669: Journal has aborted
[  345.050434] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053064] EXT4-fs error (device hdb1) in ext4_truncate:3668: Journal has aborted
[  345.053582] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted
[  345.053946] EXT4-fs error (device hdb1) in ext4_orphan_del:2686: Journal has aborted
[  345.055367] EXT4-fs error (device hdb1) in ext4_reserve_inode_write:4834: Journal has aborted

The failure is really premature because GFP_NOFS allocation context is
very restricted - especially in the fs metadata heavy loads. Before we
go with a more sofisticated solution, let's simply imitate the previous
behavior of non-failing NOFS allocation and use __GFP_NOFAIL for the
buddy block allocator. I wasn't able to trigger the issue with this
patch anymore.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/ext4/mballoc.c | 52 ++++++++++++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b1613a54307..0360ea32c30f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -992,9 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 	block = group * 2;
 	pnum = block / blocks_per_page;
 	poff = block % blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-	if (!page)
-		return -ENOMEM;
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	BUG_ON(page->mapping != inode->i_mapping);
 	e4b->bd_bitmap_page = page;
 	e4b->bd_bitmap = page_address(page) + (poff * sb->s_blocksize);
@@ -1006,9 +1005,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
 
 	block++;
 	pnum = block / blocks_per_page;
-	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-	if (!page)
-		return -ENOMEM;
+	page = find_or_create_page(inode->i_mapping, pnum,
+				   GFP_NOFS|__GFP_NOFAIL);
 	BUG_ON(page->mapping != inode->i_mapping);
 	e4b->bd_buddy_page = page;
 	return 0;
@@ -1158,20 +1156,19 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 			 * wait for it to initialize.
 			 */
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-		if (page) {
-			BUG_ON(page->mapping != inode->i_mapping);
-			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, NULL);
-				if (ret) {
-					unlock_page(page);
-					goto err;
-				}
-				mb_cmp_bitmaps(e4b, page_address(page) +
-					       (poff * sb->s_blocksize));
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
+		BUG_ON(page->mapping != inode->i_mapping);
+		if (!PageUptodate(page)) {
+			ret = ext4_mb_init_cache(page, NULL);
+			if (ret) {
+				unlock_page(page);
+				goto err;
 			}
-			unlock_page(page);
+			mb_cmp_bitmaps(e4b, page_address(page) +
+				       (poff * sb->s_blocksize));
 		}
+		unlock_page(page);
 	}
 	if (page == NULL) {
 		ret = -ENOMEM;
@@ -1194,18 +1191,17 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
 			page_cache_release(page);
-		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
-		if (page) {
-			BUG_ON(page->mapping != inode->i_mapping);
-			if (!PageUptodate(page)) {
-				ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
-				if (ret) {
-					unlock_page(page);
-					goto err;
-				}
+		page = find_or_create_page(inode->i_mapping, pnum,
+					   GFP_NOFS|__GFP_NOFAIL);
+		BUG_ON(page->mapping != inode->i_mapping);
+		if (!PageUptodate(page)) {
+			ret = ext4_mb_init_cache(page, e4b->bd_bitmap);
+			if (ret) {
+				unlock_page(page);
+				goto err;
 			}
-			unlock_page(page);
 		}
+		unlock_page(page);
 	}
 	if (page == NULL) {
 		ret = -ENOMEM;
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 6/8] ext3: Do not abort journal prematurely
  2015-08-05  9:51   ` mhocko
@ 2015-08-18 10:39     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:39 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:

[   83.256914] journal_get_undo_access: No memory for committed data
[   83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_get_undo_access
[   83.259785] EXT3-fs (hdb1): error in ext3_free_blocks_sb: Out of
memory
[   83.267130] Aborting journal on device hdb1.
[   83.292308] EXT3-fs (hdb1): error: ext3_journal_start_sb: Detected
aborted journal
[   83.293630] EXT3-fs (hdb1): error: remounting filesystem read-only

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
these allocation requests are allowed to fail so we need to use
__GFP_NOFAIL to imitate the previous behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bf7474deda2f..2151b80276c3 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -886,15 +886,8 @@ int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 		goto out;
 
 repeat:
-	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
-		if (!committed_data) {
-			printk(KERN_ERR "%s: No memory for committed data\n",
-				__func__);
-			err = -ENOMEM;
-			goto out;
-		}
-	}
+	if (!jh->b_committed_data)
+		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL);
 
 	jbd_lock_bh_state(bh);
 	if (!jh->b_committed_data) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 6/8] ext3: Do not abort journal prematurely
@ 2015-08-18 10:39     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:39 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

journal_get_undo_access is relying on GFP_NOFS allocation yet it is
essential for the journal transaction:

[   83.256914] journal_get_undo_access: No memory for committed data
[   83.258022] EXT3-fs: ext3_free_blocks_sb: aborting transaction: Out
of memory in __ext3_journal_get_undo_access
[   83.259785] EXT3-fs (hdb1): error in ext3_free_blocks_sb: Out of
memory
[   83.267130] Aborting journal on device hdb1.
[   83.292308] EXT3-fs (hdb1): error: ext3_journal_start_sb: Detected
aborted journal
[   83.293630] EXT3-fs (hdb1): error: remounting filesystem read-only

Since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
these allocation requests are allowed to fail so we need to use
__GFP_NOFAIL to imitate the previous behavior.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/jbd/transaction.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index bf7474deda2f..2151b80276c3 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -886,15 +886,8 @@ int journal_get_undo_access(handle_t *handle, struct buffer_head *bh)
 		goto out;
 
 repeat:
-	if (!jh->b_committed_data) {
-		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS);
-		if (!committed_data) {
-			printk(KERN_ERR "%s: No memory for committed data\n",
-				__func__);
-			err = -ENOMEM;
-			goto out;
-		}
-	}
+	if (!jh->b_committed_data)
+		committed_data = jbd_alloc(jh2bh(jh)->b_size, GFP_NOFS | __GFP_NOFAIL);
 
 	jbd_lock_bh_state(bh);
 	if (!jh->b_committed_data) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 7/8] btrfs: Prevent from early transaction abort
  2015-08-05  9:51   ` mhocko
@ 2015-08-18 10:40     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:40 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..d855ddffd5fe 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,9 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
-	if (eb == NULL)
-		return NULL;
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	eb->start = start;
 	eb->len = len;
 	eb->fs_info = fs_info;
@@ -4867,9 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
-		if (!p)
-			goto free_eb;
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 
 		spin_lock(&mapping->private_lock);
 		if (PagePrivate(p)) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 10:40     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:40 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..d855ddffd5fe 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,9 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
-	if (eb == NULL)
-		return NULL;
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	eb->start = start;
 	eb->len = len;
 	eb->fs_info = fs_info;
@@ -4867,9 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
-		if (!p)
-			goto free_eb;
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 
 		spin_lock(&mapping->private_lock);
 		if (PagePrivate(p)) {
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
  2015-08-05  9:51   ` mhocko
@ 2015-08-18 10:41     ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:41 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045

This is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/volumes.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..42b9949dd71d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,9 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 		 * and the stripes
 		 */
 		sizeof(u64) * (total_stripes),
-		GFP_NOFS);
-	if (!bbio)
-		return NULL;
+		GFP_NOFS|__GFP_NOFAIL);
 
 	atomic_set(&bbio->error, 0);
 	atomic_set(&bbio->refs, 1);
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RFC -v2 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
@ 2015-08-18 10:41     ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:41 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

From: Michal Hocko <mhocko@suse.com>

alloc_btrfs_bio is relying on GFP_NOFS to allocate a bio but since "mm:
page_alloc: do not lock up GFP_NOFS allocations upon OOM" this is
allowed to fail which can lead to
[   37.928625] kernel BUG at fs/btrfs/extent_io.c:4045

This is clearly undesirable and the nofail behavior should be explicit
if the allocation failure cannot be tolerated.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/volumes.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 53af23f2c087..42b9949dd71d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4914,9 +4914,7 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 		 * and the stripes
 		 */
 		sizeof(u64) * (total_stripes),
-		GFP_NOFS);
-	if (!bbio)
-		return NULL;
+		GFP_NOFS|__GFP_NOFAIL);
 
 	atomic_set(&bbio->error, 0);
 	atomic_set(&bbio->refs, 1);
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 5/8] ext4: Do not fail journal due to block allocator
  2015-08-18 10:39     ` Michal Hocko
@ 2015-08-18 10:55       ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:55 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 12:39:03, Michal Hocko wrote:
[...]
> @@ -992,9 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  	block = group * 2;
>  	pnum = block / blocks_per_page;
>  	poff = block % blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> -	if (!page)
> -		return -ENOMEM;
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);

Scratch this one. find_or_create_page is allowed to return NULL. The
patch is bogus. I was overly eager to turn all places to not check the
return value.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 5/8] ext4: Do not fail journal due to block allocator
@ 2015-08-18 10:55       ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 10:55 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 12:39:03, Michal Hocko wrote:
[...]
> @@ -992,9 +992,8 @@ static int ext4_mb_get_buddy_page_lock(struct super_block *sb,
>  	block = group * 2;
>  	pnum = block / blocks_per_page;
>  	poff = block % blocks_per_page;
> -	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
> -	if (!page)
> -		return -ENOMEM;
> +	page = find_or_create_page(inode->i_mapping, pnum,
> +				   GFP_NOFS|__GFP_NOFAIL);

Scratch this one. find_or_create_page is allowed to return NULL. The
patch is bogus. I was overly eager to turn all places to not check the
return value.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
  2015-08-18 10:40     ` Michal Hocko
  (?)
@ 2015-08-18 11:01       ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 11:01 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 12:40:31, Michal Hocko wrote:
[...]
> @@ -4867,9 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		return NULL;
>  
>  	for (i = 0; i < num_pages; i++, index++) {
> -		p = find_or_create_page(mapping, index, GFP_NOFS);
> -		if (!p)
> -			goto free_eb;
> +		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
>  
>  		spin_lock(&mapping->private_lock);
>  		if (PagePrivate(p)) {

Same here. find_or_create_page might return NULL.
---
>From f430e5f54367b8815e1099f26fedd2873b597a07 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 15 Jul 2015 19:27:06 +0200
Subject: [PATCH] btrfs: Prevent from early transaction abort

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..f4d6eea975d7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,9 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
-	if (eb == NULL)
-		return NULL;
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	eb->start = start;
 	eb->len = len;
 	eb->fs_info = fs_info;
@@ -4867,7 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p)
 			goto free_eb;
 
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 11:01       ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 11:01 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 12:40:31, Michal Hocko wrote:
[...]
> @@ -4867,9 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		return NULL;
>  
>  	for (i = 0; i < num_pages; i++, index++) {
> -		p = find_or_create_page(mapping, index, GFP_NOFS);
> -		if (!p)
> -			goto free_eb;
> +		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
>  
>  		spin_lock(&mapping->private_lock);
>  		if (PagePrivate(p)) {

Same here. find_or_create_page might return NULL.
---
>From f430e5f54367b8815e1099f26fedd2873b597a07 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 15 Jul 2015 19:27:06 +0200
Subject: [PATCH] btrfs: Prevent from early transaction abort

Btrfs relies on GFP_NOFS allocation when commiting the transaction but
since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
those allocations are allowed to fail which can lead to a pre-mature
transaction abort:

[   55.328093] Call Trace:
[   55.328890]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.330518]  [<ffffffff8108fa28>] ? console_unlock+0x334/0x363
[   55.332738]  [<ffffffff8110873e>] __alloc_pages_nodemask+0x81d/0x8d4
[   55.334910]  [<ffffffff81100752>] pagecache_get_page+0x10e/0x20c
[   55.336844]  [<ffffffffa007d916>] alloc_extent_buffer+0xd0/0x350 [btrfs]
[   55.338973]  [<ffffffffa0059d8c>] btrfs_find_create_tree_block+0x15/0x17 [btrfs]
[   55.341329]  [<ffffffffa004f728>] btrfs_alloc_tree_block+0x18c/0x405 [btrfs]
[   55.343566]  [<ffffffffa003fa34>] split_leaf+0x1e4/0x6a6 [btrfs]
[   55.345577]  [<ffffffffa0040567>] btrfs_search_slot+0x671/0x831 [btrfs]
[   55.347679]  [<ffffffff810682d7>] ? get_parent_ip+0xe/0x3e
[   55.349434]  [<ffffffffa0041cb2>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs]
[   55.351681]  [<ffffffffa004ecfb>] __btrfs_run_delayed_refs+0x7a6/0xf35 [btrfs]
[   55.353979]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.356212]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.358378]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.360626]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.362894]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.365221]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.367273]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.369047]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.370654]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.372246]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.373851]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[   55.381070] BTRFS: error (device hdb1) in btrfs_run_delayed_refs:2821: errno=-12 Out of memory
[   55.382431] BTRFS warning (device hdb1): Skipping commit of aborted transaction.
[   55.382433] BTRFS warning (device hdb1): cleanup_transaction:1692: Aborting unused transaction(IO failure).
[   55.384280] ------------[ cut here ]------------
[   55.384312] WARNING: CPU: 0 PID: 3010 at fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0xd9/0xfe [btrfs]()
[...]
[   55.384337] Call Trace:
[   55.384353]  [<ffffffff8154e6f0>] dump_stack+0x4f/0x7b
[   55.384357]  [<ffffffff8107f717>] ? down_trylock+0x2d/0x37
[   55.384359]  [<ffffffff81046977>] warn_slowpath_common+0xa1/0xbb
[   55.384398]  [<ffffffffa00a1d6b>] ? btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384400]  [<ffffffff81046a34>] warn_slowpath_null+0x1a/0x1c
[   55.384423]  [<ffffffffa00a1d6b>] btrfs_select_ref_head+0xd9/0xfe [btrfs]
[   55.384446]  [<ffffffffa004e5f7>] ? __btrfs_run_delayed_refs+0xa2/0xf35 [btrfs]
[   55.384455]  [<ffffffffa004e600>] __btrfs_run_delayed_refs+0xab/0xf35 [btrfs]
[   55.384476]  [<ffffffffa00512ea>] btrfs_run_delayed_refs+0x6e/0x226 [btrfs]
[   55.384499]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384521]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384543]  [<ffffffffa0060221>] btrfs_commit_transaction+0x4c/0xaba [btrfs]
[   55.384565]  [<ffffffffa0060e21>] ? start_transaction+0x192/0x534 [btrfs]
[   55.384588]  [<ffffffffa0073428>] btrfs_sync_file+0x29c/0x310 [btrfs]
[   55.384591]  [<ffffffff81186808>] vfs_fsync_range+0x8f/0x9e
[   55.384592]  [<ffffffff81186833>] vfs_fsync+0x1c/0x1e
[   55.384593]  [<ffffffff81186869>] do_fsync+0x34/0x4e
[   55.384594]  [<ffffffff81186ab3>] SyS_fsync+0x10/0x14
[   55.384595]  [<ffffffff81554f97>] system_call_fastpath+0x12/0x6f
[...]
[   55.384608] ---[ end trace c29799da1d4dd621 ]---
[   55.437323] BTRFS info (device hdb1): forced readonly
[   55.438815] BTRFS info (device hdb1): delayed_refs has NO entry

Fix this by reintroducing the no-fail behavior of this allocation path
with the explicit __GFP_NOFAIL.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/btrfs/extent_io.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c374e1e71e5f..f4d6eea975d7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4607,9 +4607,7 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 {
 	struct extent_buffer *eb = NULL;
 
-	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS);
-	if (eb == NULL)
-		return NULL;
+	eb = kmem_cache_zalloc(extent_buffer_cache, GFP_NOFS|__GFP_NOFAIL);
 	eb->start = start;
 	eb->len = len;
 	eb->fs_info = fs_info;
@@ -4867,7 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 		return NULL;
 
 	for (i = 0; i < num_pages; i++, index++) {
-		p = find_or_create_page(mapping, index, GFP_NOFS);
+		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
 		if (!p)
 			goto free_eb;
 
-- 
2.5.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 11:01       ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 11:01 UTC (permalink / raw)
  To: LKML
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 12:40:31, Michal Hocko wrote:
[...]
> @@ -4867,9 +4865,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>  		return NULL;
>  
>  	for (i = 0; i < num_pages; i++, index++) {
> -		p = find_or_create_page(mapping, index, GFP_NOFS);
> -		if (!p)
> -			goto free_eb;
> +		p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
>  
>  		spin_lock(&mapping->private_lock);
>  		if (PagePrivate(p)) {

Same here. find_or_create_page might return NULL.
---

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
  2015-08-18 10:40     ` Michal Hocko
  (?)
@ 2015-08-18 17:11       ` Chris Mason
  -1 siblings, 0 replies; 82+ messages in thread
From: Chris Mason @ 2015-08-18 17:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> those allocations are allowed to fail which can lead to a pre-mature
> transaction abort:

I can either put the btrfs nofail ones on my pull for Linus, or you can
add my sob and send as one unit.  Just let me know how you'd rather do
it.

-chris

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 17:11       ` Chris Mason
  0 siblings, 0 replies; 82+ messages in thread
From: Chris Mason @ 2015-08-18 17:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> those allocations are allowed to fail which can lead to a pre-mature
> transaction abort:

I can either put the btrfs nofail ones on my pull for Linus, or you can
add my sob and send as one unit.  Just let me know how you'd rather do
it.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 17:11       ` Chris Mason
  0 siblings, 0 replies; 82+ messages in thread
From: Chris Mason @ 2015-08-18 17:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> those allocations are allowed to fail which can lead to a pre-mature
> transaction abort:

I can either put the btrfs nofail ones on my pull for Linus, or you can
add my sob and send as one unit.  Just let me know how you'd rather do
it.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
  2015-08-18 17:11       ` Chris Mason
@ 2015-08-18 17:29         ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 17:29 UTC (permalink / raw)
  To: Chris Mason
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 13:11:44, Chris Mason wrote:
> On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> > since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> > those allocations are allowed to fail which can lead to a pre-mature
> > transaction abort:
> 
> I can either put the btrfs nofail ones on my pull for Linus, or you can
> add my sob and send as one unit.  Just let me know how you'd rather do
> it.

OK, I will rephrase the changelogs (tomorrow) to not refer to an
unmerged patch and would appreciate if you can take them and route them
through your tree. I will then drop them from my pile.

Thanks.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-18 17:29         ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-18 17:29 UTC (permalink / raw)
  To: Chris Mason
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 13:11:44, Chris Mason wrote:
> On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> > since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> > those allocations are allowed to fail which can lead to a pre-mature
> > transaction abort:
> 
> I can either put the btrfs nofail ones on my pull for Linus, or you can
> add my sob and send as one unit.  Just let me know how you'd rather do
> it.

OK, I will rephrase the changelogs (tomorrow) to not refer to an
unmerged patch and would appreciate if you can take them and route them
through your tree. I will then drop them from my pile.

Thanks.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
  2015-08-18 17:29         ` Michal Hocko
@ 2015-08-19 12:26           ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-19 12:26 UTC (permalink / raw)
  To: Chris Mason
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 19:29:14, Michal Hocko wrote:
> On Tue 18-08-15 13:11:44, Chris Mason wrote:
> > On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> > > since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> > > those allocations are allowed to fail which can lead to a pre-mature
> > > transaction abort:
> > 
> > I can either put the btrfs nofail ones on my pull for Linus, or you can
> > add my sob and send as one unit.  Just let me know how you'd rather do
> > it.
> 
> OK, I will rephrase the changelogs (tomorrow) to not refer to an
> unmerged patch and would appreciate if you can take them and route them
> through your tree. I will then drop them from my pile.

Poste in a separate thread
http://lkml.kernel.org/r/1439986661-15896-1-git-send-email-mhocko@kernel.org
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 7/8] btrfs: Prevent from early transaction abort
@ 2015-08-19 12:26           ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-19 12:26 UTC (permalink / raw)
  To: Chris Mason
  Cc: LKML, linux-mm, linux-fsdevel, Andrew Morton, Johannes Weiner,
	Tetsuo Handa, Dave Chinner, Theodore Ts'o, linux-btrfs,
	linux-ext4, Jan Kara

On Tue 18-08-15 19:29:14, Michal Hocko wrote:
> On Tue 18-08-15 13:11:44, Chris Mason wrote:
> > On Tue, Aug 18, 2015 at 12:40:32PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Btrfs relies on GFP_NOFS allocation when commiting the transaction but
> > > since "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM"
> > > those allocations are allowed to fail which can lead to a pre-mature
> > > transaction abort:
> > 
> > I can either put the btrfs nofail ones on my pull for Linus, or you can
> > add my sob and send as one unit.  Just let me know how you'd rather do
> > it.
> 
> OK, I will rephrase the changelogs (tomorrow) to not refer to an
> unmerged patch and would appreciate if you can take them and route them
> through your tree. I will then drop them from my pile.

Poste in a separate thread
http://lkml.kernel.org/r/1439986661-15896-1-git-send-email-mhocko@kernel.org
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-15 13:54         ` Theodore Ts'o
@ 2015-08-24 12:06           ` Michal Hocko
  -1 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-24 12:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

Hi Ted,

On Sat 15-08-15 09:54:22, Theodore Ts'o wrote:
> On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > > Is this "if (!committed_data) {" check now dead code?
> > > 
> > > I also see other similar suspected dead sites in the rest of the series.
> > 
> > You are absolutely right. I have updated the patches.
> 
> Have you sent out an updated version of these patches?  Maybe I missed
> it, but I don't think I saw them.

would you be interested in these two patches sent with rephrased
changelog to not depend on the patch which allows GFP_NOFS to fail? The
way this has been handled for btrfs...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
@ 2015-08-24 12:06           ` Michal Hocko
  0 siblings, 0 replies; 82+ messages in thread
From: Michal Hocko @ 2015-08-24 12:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Greg Thelen, LKML, linux-mm, linux-fsdevel, Andrew Morton,
	Johannes Weiner, Tetsuo Handa, Dave Chinner, linux-btrfs,
	linux-ext4, Jan Kara

Hi Ted,

On Sat 15-08-15 09:54:22, Theodore Ts'o wrote:
> On Wed, Aug 12, 2015 at 11:14:11AM +0200, Michal Hocko wrote:
> > > Is this "if (!committed_data) {" check now dead code?
> > > 
> > > I also see other similar suspected dead sites in the rest of the series.
> > 
> > You are absolutely right. I have updated the patches.
> 
> Have you sent out an updated version of these patches?  Maybe I missed
> it, but I don't think I saw them.

would you be interested in these two patches sent with rephrased
changelog to not depend on the patch which allows GFP_NOFS to fail? The
way this has been handled for btrfs...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
  2015-08-05  9:51 ` mhocko
@ 2015-09-07 16:51   ` Tetsuo Handa
  -1 siblings, 0 replies; 82+ messages in thread
From: Tetsuo Handa @ 2015-09-07 16:51 UTC (permalink / raw)
  To: mhocko, linux-kernel
  Cc: linux-mm, linux-fsdevel, akpm, hannes, david, tytso, jack

Michal Hocko wrote:
> As the VM cannot do much about these requests we should face the reality
> and allow those allocations to fail. Johannes has already posted the
> patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
> but the discussion died pretty quickly.

Addition of __GFP_NOFAIL to some locations is accepted, but otherwise
this patchset seems to be stalled.

> With all the patches applied none of the 4 filesystems gets aborted
> transactions and RO remount (well xfs didn't need any special
> treatment). This is obviously not sufficient to claim that failing
> GFP_NOFS is OK now but I think it is a good start for the further
> discussion. I would be grateful if FS people could have a look at those
> patches.  I have simply used __GFP_NOFAIL in the critical paths. This
> might be not the best strategy but it sounds like a good first step.

I posted my comment at
https://osdn.jp/projects/tomoyo/lists/archive/users-en/2015-September/000630.html .

> The third patch allows GFP_NOFS to fail and I believe it should see much
> more testing coverage. It would be really great if it could sit in the
> mmotm tree for few release cycles so that we can catch more fallouts.

Guessing from responses to this patchset, sitting in the mmotm tree can
hardly acquire testing coverage. Also, FS is not the only location that
needs to be tested. If you really want to push "GFP_NOFS can fail" patch,
I think you need to make a lot of effort to encourage kernel developers to
test using mandatory fault injection.

> Thoughts? Opinions?

To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step
fashion after adding proactive countermeasure sounds better than changing
the default behavior (implicitly applying __GFP_NORETRY inside).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-09-07 16:51   ` Tetsuo Handa
  0 siblings, 0 replies; 82+ messages in thread
From: Tetsuo Handa @ 2015-09-07 16:51 UTC (permalink / raw)
  To: mhocko, linux-kernel
  Cc: linux-mm, linux-fsdevel, akpm, hannes, david, tytso, jack

Michal Hocko wrote:
> As the VM cannot do much about these requests we should face the reality
> and allow those allocations to fail. Johannes has already posted the
> patch which does that (http://marc.info/?l=linux-mm&m=142726428514236&w=2)
> but the discussion died pretty quickly.

Addition of __GFP_NOFAIL to some locations is accepted, but otherwise
this patchset seems to be stalled.

> With all the patches applied none of the 4 filesystems gets aborted
> transactions and RO remount (well xfs didn't need any special
> treatment). This is obviously not sufficient to claim that failing
> GFP_NOFS is OK now but I think it is a good start for the further
> discussion. I would be grateful if FS people could have a look at those
> patches.  I have simply used __GFP_NOFAIL in the critical paths. This
> might be not the best strategy but it sounds like a good first step.

I posted my comment at
https://osdn.jp/projects/tomoyo/lists/archive/users-en/2015-September/000630.html .

> The third patch allows GFP_NOFS to fail and I believe it should see much
> more testing coverage. It would be really great if it could sit in the
> mmotm tree for few release cycles so that we can catch more fallouts.

Guessing from responses to this patchset, sitting in the mmotm tree can
hardly acquire testing coverage. Also, FS is not the only location that
needs to be tested. If you really want to push "GFP_NOFS can fail" patch,
I think you need to make a lot of effort to encourage kernel developers to
test using mandatory fault injection.

> Thoughts? Opinions?

To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step
fashion after adding proactive countermeasure sounds better than changing
the default behavior (implicitly applying __GFP_NORETRY inside).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
  2015-09-07 16:51   ` Tetsuo Handa
@ 2015-09-15 13:16     ` Tetsuo Handa
  -1 siblings, 0 replies; 82+ messages in thread
From: Tetsuo Handa @ 2015-09-15 13:16 UTC (permalink / raw)
  To: mhocko, linux-kernel
  Cc: linux-mm, linux-fsdevel, akpm, hannes, david, tytso, jack

Tetsuo Handa wrote:
> > Thoughts? Opinions?
> 
> To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step
> fashion after adding proactive countermeasure sounds better than changing
> the default behavior (implicitly applying __GFP_NORETRY inside).
> 

Ping?

I showed you at http://marc.info/?l=linux-mm&m=144198479931388 that
changing the default behavior can not terminate the game of Whack-A-Mole.
As long as there are unkillable threads, we can't kill context-sensitive
moles.

I believe that what we need to do now is to add a proactive countermeasure
(e.g. kill more processes) than try to reduce the possibility of hitting
this issue (e.g. allow !__GFP_FS to fail).

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC 0/8] Allow GFP_NOFS allocation to fail
@ 2015-09-15 13:16     ` Tetsuo Handa
  0 siblings, 0 replies; 82+ messages in thread
From: Tetsuo Handa @ 2015-09-15 13:16 UTC (permalink / raw)
  To: mhocko, linux-kernel
  Cc: linux-mm, linux-fsdevel, akpm, hannes, david, tytso, jack

Tetsuo Handa wrote:
> > Thoughts? Opinions?
> 
> To me, fixing callers (adding __GFP_NORETRY to callers) in a step-by-step
> fashion after adding proactive countermeasure sounds better than changing
> the default behavior (implicitly applying __GFP_NORETRY inside).
> 

Ping?

I showed you at http://marc.info/?l=linux-mm&m=144198479931388 that
changing the default behavior can not terminate the game of Whack-A-Mole.
As long as there are unkillable threads, we can't kill context-sensitive
moles.

I believe that what we need to do now is to add a proactive countermeasure
(e.g. kill more processes) than try to reduce the possibility of hitting
this issue (e.g. allow !__GFP_FS to fail).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RFC -v2 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure
  2015-08-18 10:38     ` Michal Hocko
  (?)
@ 2016-03-13 21:37     ` Theodore Ts'o
  -1 siblings, 0 replies; 82+ messages in thread
From: Theodore Ts'o @ 2016-03-13 21:37 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-ext4

On Tue, Aug 18, 2015 at 12:38:23PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Journal transaction might fail prematurely because the frozen_buffer
> is allocated by GFP_NOFS request:
> [   72.440013] do_get_write_access: OOM for frozen_buffer
> [   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
> (...snipped....)
> [   72.495559] do_get_write_access: OOM for frozen_buffer
> [   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.496839] do_get_write_access: OOM for frozen_buffer
> [   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
> [   72.505766] Aborting journal on device sda1-8.
> [   72.505851] EXT4-fs (sda1): Remounting filesystem read-only
> 
> This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
> allocations upon OOM" because small GPF_NOFS allocations never failed.
> This allocation seems essential for the journal and GFP_NOFS is too
> restrictive to the memory allocator so let's use __GFP_NOFAIL here to
> emulate the previous behavior.
> 
> jbd code has the very same issue so let's do the same there as well.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2016-03-13 21:37 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-05  9:51 [RFC 0/8] Allow GFP_NOFS allocation to fail mhocko
2015-08-05  9:51 ` mhocko
2015-08-05  9:51 ` mhocko
2015-08-05  9:51 ` [RFC 1/8] mm, oom: Give __GFP_NOFAIL allocations access to memory reserves mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51 ` [RFC 2/8] mm: Allow GFP_IOFS for page_cache_read page cache allocation mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51 ` [RFC 3/8] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05 12:28   ` Tetsuo Handa
2015-08-05 14:02     ` Michal Hocko
2015-08-06 11:50       ` Tetsuo Handa
2015-08-12  9:11         ` Michal Hocko
2015-08-16 14:04           ` Tetsuo Handa
2015-08-05  9:51 ` [RFC 4/8] jbd, jbd2: Do not fail journal because of frozen_buffer allocation failure mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05 11:42   ` Jan Kara
2015-08-05 11:42     ` Jan Kara
2015-08-05 16:49   ` Greg Thelen
2015-08-05 16:49     ` Greg Thelen
2015-08-12  9:14     ` Michal Hocko
2015-08-12  9:14       ` Michal Hocko
2015-08-15 13:54       ` Theodore Ts'o
2015-08-15 13:54         ` Theodore Ts'o
2015-08-18 10:36         ` Michal Hocko
2015-08-18 10:36           ` Michal Hocko
2015-08-24 12:06         ` Michal Hocko
2015-08-24 12:06           ` Michal Hocko
2015-08-18 10:38   ` [RFC -v2 " Michal Hocko
2015-08-18 10:38     ` Michal Hocko
2016-03-13 21:37     ` Theodore Ts'o
2015-08-05  9:51 ` [RFC 5/8] ext4: Do not fail journal due to block allocator mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05 11:43   ` Jan Kara
2015-08-05 11:43     ` Jan Kara
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-18 10:39     ` Michal Hocko
2015-08-18 10:55     ` Michal Hocko
2015-08-18 10:55       ` Michal Hocko
2015-08-05  9:51 ` [RFC 6/8] ext3: Do not abort journal prematurely mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-18 10:39   ` [RFC -v2 " Michal Hocko
2015-08-18 10:39     ` Michal Hocko
2015-08-05  9:51 ` [RFC 7/8] btrfs: Prevent from early transaction abort mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05 16:31   ` David Sterba
2015-08-05 16:31     ` David Sterba
2015-08-18 10:40   ` [RFC -v2 " Michal Hocko
2015-08-18 10:40     ` Michal Hocko
2015-08-18 11:01     ` Michal Hocko
2015-08-18 11:01       ` Michal Hocko
2015-08-18 11:01       ` Michal Hocko
2015-08-18 17:11     ` Chris Mason
2015-08-18 17:11       ` Chris Mason
2015-08-18 17:11       ` Chris Mason
2015-08-18 17:29       ` Michal Hocko
2015-08-18 17:29         ` Michal Hocko
2015-08-19 12:26         ` Michal Hocko
2015-08-19 12:26           ` Michal Hocko
2015-08-05  9:51 ` [RFC 8/8] btrfs: use __GFP_NOFAIL in alloc_btrfs_bio mhocko
2015-08-05  9:51   ` mhocko
2015-08-05  9:51   ` mhocko
2015-08-05 16:32   ` David Sterba
2015-08-05 16:32     ` David Sterba
2015-08-18 10:41   ` [RFC -v2 " Michal Hocko
2015-08-18 10:41     ` Michal Hocko
2015-08-05 19:58 ` [RFC 0/8] Allow GFP_NOFS allocation to fail Andreas Dilger
2015-08-05 19:58   ` Andreas Dilger
2015-08-06 14:34   ` Michal Hocko
2015-08-06 14:34     ` Michal Hocko
2015-08-06 14:34     ` Michal Hocko
2015-09-07 16:51 ` Tetsuo Handa
2015-09-07 16:51   ` Tetsuo Handa
2015-09-15 13:16   ` Tetsuo Handa
2015-09-15 13:16     ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.