All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>
Cc: Nils Holland <nholland@tisys.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Sat, 17 Dec 2016 20:17:07 +0900	[thread overview]
Message-ID: <18652e94-8f5c-dcf1-16e6-0deab6c642ec@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161216221202.GE7645@dhcp22.suse.cz>

Michal Hocko wrote:
> On Fri 16-12-16 12:31:51, Johannes Weiner wrote:
>>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>>  		 */
>>>  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>>>  
>>> +		/*
>>> +		 * Help non-failing allocations by giving them access to memory
>>> +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
>>> +		 * could deplete whole memory reserves which would just make
>>> +		 * the situation worse
>>> +		 */
>>> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
>>> +		if (page)
>>> +			goto got_pg;
>>> +
>>
>> But this should be a separate patch, IMO.
>>
>> Do we observe GFP_NOFS lockups when we don't do this? 
> 
> this is hard to tell but considering users like grow_dev_page we can get
> stuck with a very slow progress I believe. Those allocations could see
> some help.
> 
>> Don't we risk
>> premature exhaustion of the memory reserves, and it's better to wait
>> for other reclaimers to make some progress instead?
> 
> waiting for other reclaimers would be preferable but we should at least
> give these some priority, which is what ALLOC_HARDER should help with.
> 
>> Should we give
>> reserve access to all GFP_NOFS allocations, or just the ones from a
>> reclaim/cleaning context?
> 
> I would focus only for those which are important enough. Which are those
> is a harder question. But certainly those with GFP_NOFAIL are important
> enough.
> 
>> All that should go into the changelog of a separate allocation booster
>> patch, I think.
> 
> The reason I did both in the same patch is to address the concern about
> potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen
> ALLOC_HARDER to give the minimum portion of the reserves so that we do
> not risk other high priority users to be blocked out but still help a
> bit at least and prevent from starvation when other reclaimers are
> faster to consume the reclaimed memory.
> 
> I can extend the changelog of course but I believe that having both
> changes together makes some sense. NOFS|NOFAIL allocations are not all
> that rare and sometimes we really depend on them making a further
> progress.
> 

I feel that allowing access to memory reserves based on __GFP_NOFAIL might not
make sense. My understanding is that actual I/O operation triggered by I/O
requests by filesystem code are processed by other threads. Even if we grant
access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code,
I think that it is possible that memory allocations by underlying bio code
fails to make a further progress unless memory reserves are granted as well.

Below is a typical trace which I observe under OOM lockuped situation (though
this trace is from an OOM stress test using XFS).

----------------------------------------
[ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x2400000(GFP_NOIO) order=0 delay=430400 uninterruptible
[ 1845.187248] kworker/2:1     D12712 14498      2 0x00000080
[ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn
[ 1845.187252] Call Trace:
[ 1845.187253]  ? __schedule+0x23f/0xba0
[ 1845.187254]  schedule+0x38/0x90
[ 1845.187255]  schedule_timeout+0x205/0x4a0
[ 1845.187256]  ? del_timer_sync+0xd0/0xd0
[ 1845.187257]  schedule_timeout_uninterruptible+0x25/0x30
[ 1845.187258]  __alloc_pages_nodemask+0x1035/0x10e0
[ 1845.187259]  ? alloc_request_struct+0x14/0x20
[ 1845.187261]  alloc_pages_current+0x96/0x1b0
[ 1845.187262]  ? bio_alloc_bioset+0x20f/0x2e0
[ 1845.187264]  bio_copy_kern+0xc4/0x180
[ 1845.187265]  blk_rq_map_kern+0x6f/0x120
[ 1845.187268]  __scsi_execute.isra.23+0x12f/0x160
[ 1845.187270]  scsi_execute_req_flags+0x8f/0x100
[ 1845.187271]  sr_check_events+0xba/0x2b0 [sr_mod]
[ 1845.187274]  cdrom_check_events+0x13/0x30 [cdrom]
[ 1845.187275]  sr_block_check_events+0x25/0x30 [sr_mod]
[ 1845.187276]  disk_check_events+0x5b/0x150
[ 1845.187277]  disk_events_workfn+0x17/0x20
[ 1845.187278]  process_one_work+0x1fc/0x750
[ 1845.187279]  ? process_one_work+0x167/0x750
[ 1845.187279]  worker_thread+0x126/0x4a0
[ 1845.187280]  kthread+0x10a/0x140
[ 1845.187281]  ? process_one_work+0x750/0x750
[ 1845.187282]  ? kthread_create_on_node+0x60/0x60
[ 1845.187283]  ret_from_fork+0x2a/0x40
----------------------------------------

I think that this GFP_NOIO allocation request needs to consume more memory reserves
than GFP_NOFS allocation request to make progress. 
Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to allow
access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request?


WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>
Cc: Nils Holland <nholland@tisys.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically
Date: Sat, 17 Dec 2016 20:17:07 +0900	[thread overview]
Message-ID: <18652e94-8f5c-dcf1-16e6-0deab6c642ec@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20161216221202.GE7645@dhcp22.suse.cz>

Michal Hocko wrote:
> On Fri 16-12-16 12:31:51, Johannes Weiner wrote:
>>> @@ -3737,6 +3752,16 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>>  		 */
>>>  		WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER);
>>>  
>>> +		/*
>>> +		 * Help non-failing allocations by giving them access to memory
>>> +		 * reserves but do not use ALLOC_NO_WATERMARKS because this
>>> +		 * could deplete whole memory reserves which would just make
>>> +		 * the situation worse
>>> +		 */
>>> +		page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
>>> +		if (page)
>>> +			goto got_pg;
>>> +
>>
>> But this should be a separate patch, IMO.
>>
>> Do we observe GFP_NOFS lockups when we don't do this? 
> 
> this is hard to tell but considering users like grow_dev_page we can get
> stuck with a very slow progress I believe. Those allocations could see
> some help.
> 
>> Don't we risk
>> premature exhaustion of the memory reserves, and it's better to wait
>> for other reclaimers to make some progress instead?
> 
> waiting for other reclaimers would be preferable but we should at least
> give these some priority, which is what ALLOC_HARDER should help with.
> 
>> Should we give
>> reserve access to all GFP_NOFS allocations, or just the ones from a
>> reclaim/cleaning context?
> 
> I would focus only for those which are important enough. Which are those
> is a harder question. But certainly those with GFP_NOFAIL are important
> enough.
> 
>> All that should go into the changelog of a separate allocation booster
>> patch, I think.
> 
> The reason I did both in the same patch is to address the concern about
> potential lockups when NOFS|NOFAIL cannot make any progress. I've chosen
> ALLOC_HARDER to give the minimum portion of the reserves so that we do
> not risk other high priority users to be blocked out but still help a
> bit at least and prevent from starvation when other reclaimers are
> faster to consume the reclaimed memory.
> 
> I can extend the changelog of course but I believe that having both
> changes together makes some sense. NOFS|NOFAIL allocations are not all
> that rare and sometimes we really depend on them making a further
> progress.
> 

I feel that allowing access to memory reserves based on __GFP_NOFAIL might not
make sense. My understanding is that actual I/O operation triggered by I/O
requests by filesystem code are processed by other threads. Even if we grant
access to memory reserves to GFP_NOFS | __GFP_NOFAIL allocations by fs code,
I think that it is possible that memory allocations by underlying bio code
fails to make a further progress unless memory reserves are granted as well.

Below is a typical trace which I observe under OOM lockuped situation (though
this trace is from an OOM stress test using XFS).

----------------------------------------
[ 1845.187246] MemAlloc: kworker/2:1(14498) flags=0x4208060 switches=323636 seq=48 gfp=0x2400000(GFP_NOIO) order=0 delay=430400 uninterruptible
[ 1845.187248] kworker/2:1     D12712 14498      2 0x00000080
[ 1845.187251] Workqueue: events_freezable_power_ disk_events_workfn
[ 1845.187252] Call Trace:
[ 1845.187253]  ? __schedule+0x23f/0xba0
[ 1845.187254]  schedule+0x38/0x90
[ 1845.187255]  schedule_timeout+0x205/0x4a0
[ 1845.187256]  ? del_timer_sync+0xd0/0xd0
[ 1845.187257]  schedule_timeout_uninterruptible+0x25/0x30
[ 1845.187258]  __alloc_pages_nodemask+0x1035/0x10e0
[ 1845.187259]  ? alloc_request_struct+0x14/0x20
[ 1845.187261]  alloc_pages_current+0x96/0x1b0
[ 1845.187262]  ? bio_alloc_bioset+0x20f/0x2e0
[ 1845.187264]  bio_copy_kern+0xc4/0x180
[ 1845.187265]  blk_rq_map_kern+0x6f/0x120
[ 1845.187268]  __scsi_execute.isra.23+0x12f/0x160
[ 1845.187270]  scsi_execute_req_flags+0x8f/0x100
[ 1845.187271]  sr_check_events+0xba/0x2b0 [sr_mod]
[ 1845.187274]  cdrom_check_events+0x13/0x30 [cdrom]
[ 1845.187275]  sr_block_check_events+0x25/0x30 [sr_mod]
[ 1845.187276]  disk_check_events+0x5b/0x150
[ 1845.187277]  disk_events_workfn+0x17/0x20
[ 1845.187278]  process_one_work+0x1fc/0x750
[ 1845.187279]  ? process_one_work+0x167/0x750
[ 1845.187279]  worker_thread+0x126/0x4a0
[ 1845.187280]  kthread+0x10a/0x140
[ 1845.187281]  ? process_one_work+0x750/0x750
[ 1845.187282]  ? kthread_create_on_node+0x60/0x60
[ 1845.187283]  ret_from_fork+0x2a/0x40
----------------------------------------

I think that this GFP_NOIO allocation request needs to consume more memory reserves
than GFP_NOFS allocation request to make progress. 
Do we want to add __GFP_NOFAIL to this GFP_NOIO allocation request in order to allow
access to memory reserves as well as GFP_NOFS | __GFP_NOFAIL allocation request?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-17 11:17 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland
2016-12-16  7:39 ` Michal Hocko
2016-12-16  7:39   ` Michal Hocko
2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
2016-12-16 15:58     ` Michal Hocko
2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-16 15:58       ` Michal Hocko
2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-16 15:58       ` Michal Hocko
2016-12-16 17:31       ` Johannes Weiner
2016-12-16 17:31         ` Johannes Weiner
2016-12-16 22:12         ` Michal Hocko
2016-12-16 22:12           ` Michal Hocko
2016-12-17 11:17           ` Tetsuo Handa [this message]
2016-12-17 11:17             ` Tetsuo Handa
2016-12-18 16:37             ` Michal Hocko
2016-12-18 16:37               ` Michal Hocko
2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
2016-12-16 18:47       ` Nils Holland
2016-12-17  0:02       ` Michal Hocko
2016-12-17  0:02         ` Michal Hocko
2016-12-17 12:59         ` Nils Holland
2016-12-17 12:59           ` Nils Holland
2016-12-17 14:44           ` Tetsuo Handa
2016-12-17 14:44             ` Tetsuo Handa
2016-12-17 17:11             ` Nils Holland
2016-12-17 17:11               ` Nils Holland
2016-12-17 21:06             ` Nils Holland
2016-12-17 21:06               ` Nils Holland
2016-12-18  5:14               ` Tetsuo Handa
2016-12-18  5:14                 ` Tetsuo Handa
2016-12-19 13:45               ` Michal Hocko
2016-12-19 13:45                 ` Michal Hocko
2016-12-20  2:08                 ` Nils Holland
2016-12-20  2:08                   ` Nils Holland
2016-12-21  7:36                   ` Michal Hocko
2016-12-21  7:36                     ` Michal Hocko
2016-12-21 11:00                     ` Tetsuo Handa
2016-12-21 11:00                       ` Tetsuo Handa
2016-12-21 11:16                       ` Michal Hocko
2016-12-21 11:16                         ` Michal Hocko
2016-12-21 14:04                         ` Chris Mason
2016-12-21 14:04                           ` Chris Mason
2016-12-22 10:10                     ` Nils Holland
2016-12-22 10:10                       ` Nils Holland
2016-12-22 10:27                       ` Michal Hocko
2016-12-22 10:27                         ` Michal Hocko
2016-12-22 10:35                         ` Nils Holland
2016-12-22 10:35                           ` Nils Holland
2016-12-22 10:46                           ` Tetsuo Handa
2016-12-22 10:46                             ` Tetsuo Handa
2016-12-22 19:17                       ` Michal Hocko
2016-12-22 19:17                         ` Michal Hocko
2016-12-22 21:46                         ` Nils Holland
2016-12-22 21:46                           ` Nils Holland
2016-12-23 10:51                           ` Michal Hocko
2016-12-23 10:51                             ` Michal Hocko
2016-12-23 12:18                             ` Nils Holland
2016-12-23 12:18                               ` Nils Holland
2016-12-23 12:57                               ` Michal Hocko
2016-12-23 12:57                                 ` Michal Hocko
2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
2016-12-23 14:47                                   ` Michal Hocko
2016-12-23 22:26                                   ` Nils Holland
2016-12-23 22:26                                     ` Nils Holland
2016-12-26 12:48                                     ` Michal Hocko
2016-12-26 12:48                                       ` Michal Hocko
2016-12-26 18:57                                       ` Nils Holland
2016-12-26 18:57                                         ` Nils Holland
2016-12-27  8:08                                         ` Michal Hocko
2016-12-27  8:08                                           ` Michal Hocko
2016-12-27 11:23                                           ` Nils Holland
2016-12-27 11:23                                             ` Nils Holland
2016-12-27 11:27                                             ` Michal Hocko
2016-12-27 11:27                                               ` Michal Hocko
2016-12-27 15:55                                       ` Michal Hocko
2016-12-27 15:55                                         ` Michal Hocko
2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
2016-12-28  8:51                                           ` Michal Hocko
2016-12-28  8:51                                             ` Michal Hocko
2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
2016-12-27 19:33                                           ` Nils Holland
2016-12-28  8:57                                           ` Michal Hocko
2016-12-28  8:57                                             ` Michal Hocko
2016-12-29  1:20                                         ` Minchan Kim
2016-12-29  1:20                                           ` Minchan Kim
2016-12-29  9:04                                           ` Michal Hocko
2016-12-29  9:04                                             ` Michal Hocko
2016-12-30  2:05                                             ` Minchan Kim
2016-12-30  2:05                                               ` Minchan Kim
2016-12-30 10:40                                               ` Michal Hocko
2016-12-30 10:40                                                 ` Michal Hocko
2016-12-29  0:31                                       ` Minchan Kim
2016-12-29  0:31                                         ` Minchan Kim
2016-12-29  0:48                                         ` Minchan Kim
2016-12-29  0:48                                           ` Minchan Kim
2016-12-29  8:52                                           ` Michal Hocko
2016-12-29  8:52                                             ` Michal Hocko
2016-12-30 10:19                                       ` Mel Gorman
2016-12-30 10:19                                         ` Mel Gorman
2016-12-30 11:05                                         ` Michal Hocko
2016-12-30 11:05                                           ` Michal Hocko
2016-12-30 12:43                                           ` Mel Gorman
2016-12-30 12:43                                             ` Mel Gorman
2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
2016-12-25 22:25                                     ` kernel test robot
2016-12-26 12:26                                     ` Michal Hocko
2016-12-26 12:26                                       ` Michal Hocko
2016-12-26 12:26                                       ` Michal Hocko
2016-12-26 12:50                                       ` Michal Hocko
2016-12-26 12:50                                         ` Michal Hocko
2016-12-26 12:50                                         ` Michal Hocko
2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
2016-12-16 18:15     ` Chris Mason
2016-12-16 22:14     ` Michal Hocko
2016-12-16 22:14       ` Michal Hocko
2016-12-16 22:47       ` Chris Mason
2016-12-16 22:47         ` Chris Mason
2016-12-16 23:31         ` Michal Hocko
2016-12-16 23:31           ` Michal Hocko
2016-12-16 19:50   ` Chris Mason
2016-12-16 19:50     ` Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2016-12-01 15:25 [PATCH 0/2] GFP_NOFAIL cleanups Michal Hocko
2016-12-01 15:25 ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-01 15:25   ` Michal Hocko
2016-12-02  7:23   ` Vlastimil Babka
2016-12-02  7:23     ` Vlastimil Babka
2016-12-05 13:45   ` Tetsuo Handa
2016-12-05 13:45     ` Tetsuo Handa
2016-12-05 14:10     ` Michal Hocko
2016-12-05 14:10       ` Michal Hocko
2016-12-06  8:27       ` Michal Hocko
2016-12-06  8:27         ` Michal Hocko
2016-12-06 10:38       ` Tetsuo Handa
2016-12-06 10:38         ` Tetsuo Handa
2016-12-06 11:03         ` Vlastimil Babka
2016-12-06 11:03           ` Vlastimil Babka
2016-12-06 19:25           ` Michal Hocko
2016-12-06 19:25             ` Michal Hocko
2016-12-06 19:22         ` Michal Hocko
2016-12-06 19:22           ` Michal Hocko
2016-12-08 12:53           ` Tetsuo Handa
2016-12-08 12:53             ` Tetsuo Handa
2016-12-08 13:47             ` Michal Hocko
2016-12-08 13:47               ` Michal Hocko
2016-12-11 11:23               ` Tetsuo Handa
2016-12-11 11:23                 ` Tetsuo Handa
2016-12-11 13:53                 ` Tetsuo Handa
2016-12-11 13:53                   ` Tetsuo Handa
2016-12-12  8:52                   ` Michal Hocko
2016-12-12  8:52                     ` Michal Hocko
2016-12-12  8:48                 ` Michal Hocko
2016-12-12  8:48                   ` Michal Hocko
2016-12-14 10:34                   ` Michal Hocko
2016-12-14 10:34                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18652e94-8f5c-dcf1-16e6-0deab6c642ec@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=hannes@cmpxchg.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=nholland@tisys.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.