From: Michal Hocko <mhocko@suse.cz> To: Dave Chinner <david@fromorbit.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, dchinner@redhat.com, oleg@redhat.com, xfs@oss.sgi.com, Johannes Weiner <hannes@cmpxchg.org>, linux-mm@kvack.org, mgorman@suse.de, rientjes@google.com, akpm@linux-foundation.org, torvalds@linux-foundation.org Subject: Re: How to handle TIF_MEMDIE stalls? Date: Mon, 2 Mar 2015 16:18:32 +0100 [thread overview] Message-ID: <20150302151832.GE26334@dhcp22.suse.cz> (raw) In-Reply-To: <20150223004521.GK12722@dastard> On Mon 23-02-15 11:45:21, Dave Chinner wrote: [...] > A reserve memory pool is no different - every time a memory reserve > occurs, a watermark is lifted to accommodate it, and the transaction > is not allowed to proceed until the amount of free memory exceeds > that watermark. The memory allocation subsystem then only allows > *allocations* marked correctly to allocate pages from that the > reserve that watermark protects. e.g. only allocations using > __GFP_RESERVE are allowed to dip into the reserve pool. The idea is sound. But I am pretty sure we will find many corner cases. E.g. what if the mere reservation attempt causes the system to go OOM and trigger the OOM killer? Sure that wouldn't be too much different from the OOM triggered during the allocation but there is one major difference. Reservations need to be estimated and I expect the estimation would be on the more conservative side and so the OOM might not happen without them. > By using watermarks, freeing of memory will automatically top > up the reserve pool which means that we guarantee that reclaimable > memory allocated for demand paging during transacitons doesn't > deplete the reserve pool permanently. As a result, when there is > plenty of free and/or reclaimable memory, the reserve pool > watermarks will have almost zero impact on performance and > behaviour. Typical busy system won't be very far away from the high watermark so there would be a reclaim performed during increased watermaks (aka reservation) and that might lead to visible performance degradation. This might be acceptable but it also adds a certain level of unpredictability when performance characteristics might change suddenly. > Further, because it's just accounting and behavioural thresholds, > this allows the mm subsystem to control how the reserve pool is > accounted internally. e.g. clean, reclaimable pages in the page > cache could serve as reserve pool pages as they can be immediately > reclaimed for allocation. But they also can turn into hard/impossible to reclaim as well. Clean pages might get dirty and e.g. swap backed pages run out of their backing storage. So I guess we cannot count with those pages without reclaiming them first and hiding them into the reserve. Which is what you suggest below probably but I wasn't really sure... > This could be acheived by setting reclaim targets first to the reserve > pool watermark, then the second target is enough pages to satisfy the > current allocation. > > And, FWIW, there's nothing stopping this mechanism from have order > based reserve thresholds. e.g. IB could really do with a 64k reserve > pool threshold and hence help solve the long standing problems they > have with filling the receive ring in GFP_ATOMIC context... > > Sure, that's looking further down the track, but my point still > remains: we need a viable long term solution to this problem. Maybe > reservations are not the solution, but I don't see anyone else who > is thinking of how to address this architectural problem at a system > level right now. I think the idea is good! It will just be quite tricky to get there without causing more problems than those being solved. The biggest question mark so far seems to be the reservation size estimation. If it is hard for any caller to know the size beforehand (which would be really close to the actually used size) then the whole complexity in the code sounds like an overkill and asking administrator to tune min_free_kbytes seems a better fit (we would still have to teach the allocator to access reserves when really necessary) because the system would behave more predictably (although some memory would be wasted). > We need to design and document the model first, then review it, then > we can start working at the code level to implement the solution we've > designed. I have already asked James to add this on LSF agenda but nothing has materialized on the schedule yet. I will poke him again. -- Michal Hocko SUSE Labs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz> To: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, dchinner@redhat.com, linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com, akpm@linux-foundation.org, mgorman@suse.de, torvalds@linux-foundation.org, xfs@oss.sgi.com Subject: Re: How to handle TIF_MEMDIE stalls? Date: Mon, 2 Mar 2015 16:18:32 +0100 [thread overview] Message-ID: <20150302151832.GE26334@dhcp22.suse.cz> (raw) In-Reply-To: <20150223004521.GK12722@dastard> On Mon 23-02-15 11:45:21, Dave Chinner wrote: [...] > A reserve memory pool is no different - every time a memory reserve > occurs, a watermark is lifted to accommodate it, and the transaction > is not allowed to proceed until the amount of free memory exceeds > that watermark. The memory allocation subsystem then only allows > *allocations* marked correctly to allocate pages from that the > reserve that watermark protects. e.g. only allocations using > __GFP_RESERVE are allowed to dip into the reserve pool. The idea is sound. But I am pretty sure we will find many corner cases. E.g. what if the mere reservation attempt causes the system to go OOM and trigger the OOM killer? Sure that wouldn't be too much different from the OOM triggered during the allocation but there is one major difference. Reservations need to be estimated and I expect the estimation would be on the more conservative side and so the OOM might not happen without them. > By using watermarks, freeing of memory will automatically top > up the reserve pool which means that we guarantee that reclaimable > memory allocated for demand paging during transacitons doesn't > deplete the reserve pool permanently. As a result, when there is > plenty of free and/or reclaimable memory, the reserve pool > watermarks will have almost zero impact on performance and > behaviour. Typical busy system won't be very far away from the high watermark so there would be a reclaim performed during increased watermaks (aka reservation) and that might lead to visible performance degradation. This might be acceptable but it also adds a certain level of unpredictability when performance characteristics might change suddenly. > Further, because it's just accounting and behavioural thresholds, > this allows the mm subsystem to control how the reserve pool is > accounted internally. e.g. clean, reclaimable pages in the page > cache could serve as reserve pool pages as they can be immediately > reclaimed for allocation. But they also can turn into hard/impossible to reclaim as well. Clean pages might get dirty and e.g. swap backed pages run out of their backing storage. So I guess we cannot count with those pages without reclaiming them first and hiding them into the reserve. Which is what you suggest below probably but I wasn't really sure... > This could be acheived by setting reclaim targets first to the reserve > pool watermark, then the second target is enough pages to satisfy the > current allocation. > > And, FWIW, there's nothing stopping this mechanism from have order > based reserve thresholds. e.g. IB could really do with a 64k reserve > pool threshold and hence help solve the long standing problems they > have with filling the receive ring in GFP_ATOMIC context... > > Sure, that's looking further down the track, but my point still > remains: we need a viable long term solution to this problem. Maybe > reservations are not the solution, but I don't see anyone else who > is thinking of how to address this architectural problem at a system > level right now. I think the idea is good! It will just be quite tricky to get there without causing more problems than those being solved. The biggest question mark so far seems to be the reservation size estimation. If it is hard for any caller to know the size beforehand (which would be really close to the actually used size) then the whole complexity in the code sounds like an overkill and asking administrator to tune min_free_kbytes seems a better fit (we would still have to teach the allocator to access reserves when really necessary) because the system would behave more predictably (although some memory would be wasted). > We need to design and document the model first, then review it, then > we can start working at the code level to implement the solution we've > designed. I have already asked James to add this on LSF agenda but nothing has materialized on the schedule yet. I will poke him again. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-02 15:18 UTC|newest] Thread overview: 276+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-12-12 13:54 [RFC PATCH] oom: Don't count on mm-less current process Tetsuo Handa 2014-12-16 12:47 ` Michal Hocko 2014-12-17 11:54 ` Tetsuo Handa 2014-12-17 13:08 ` Michal Hocko 2014-12-18 12:11 ` Tetsuo Handa 2014-12-18 15:33 ` Michal Hocko 2014-12-19 12:07 ` Tetsuo Handa 2014-12-19 12:49 ` Michal Hocko 2014-12-20 9:13 ` Tetsuo Handa 2014-12-20 11:42 ` Tetsuo Handa 2014-12-22 20:25 ` Michal Hocko 2014-12-23 1:00 ` Tetsuo Handa 2014-12-23 9:51 ` Michal Hocko 2014-12-23 11:46 ` Tetsuo Handa 2014-12-23 11:57 ` Tetsuo Handa 2014-12-23 12:12 ` Tetsuo Handa 2014-12-23 12:27 ` Michal Hocko 2014-12-23 12:24 ` Michal Hocko 2014-12-23 13:00 ` Tetsuo Handa 2014-12-23 13:09 ` Michal Hocko 2014-12-23 13:20 ` Tetsuo Handa 2014-12-23 13:43 ` Michal Hocko 2014-12-23 14:11 ` Tetsuo Handa 2014-12-23 14:57 ` Michal Hocko 2014-12-19 12:22 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa 2014-12-20 2:03 ` Dave Chinner 2014-12-20 12:41 ` Tetsuo Handa 2014-12-20 22:35 ` Dave Chinner 2014-12-21 8:45 ` Tetsuo Handa 2014-12-21 20:42 ` Dave Chinner 2014-12-22 16:57 ` Michal Hocko 2014-12-22 21:30 ` Dave Chinner 2014-12-23 9:41 ` Johannes Weiner 2014-12-24 1:06 ` Dave Chinner 2014-12-24 2:40 ` Linus Torvalds 2014-12-29 18:19 ` Michal Hocko 2014-12-30 6:42 ` Tetsuo Handa 2014-12-30 11:21 ` Michal Hocko 2014-12-30 13:33 ` Tetsuo Handa 2014-12-31 10:24 ` Tetsuo Handa 2015-02-09 11:44 ` Tetsuo Handa 2015-02-10 13:58 ` Tetsuo Handa 2015-02-10 15:19 ` Johannes Weiner 2015-02-11 2:23 ` Tetsuo Handa 2015-02-11 13:37 ` Tetsuo Handa 2015-02-11 18:50 ` Oleg Nesterov 2015-02-11 18:59 ` Oleg Nesterov 2015-03-14 13:03 ` Tetsuo Handa 2015-02-17 12:23 ` Tetsuo Handa 2015-02-17 12:53 ` Johannes Weiner 2015-02-17 15:38 ` Michal Hocko 2015-02-17 22:54 ` Dave Chinner 2015-02-17 22:54 ` Dave Chinner 2015-02-17 23:32 ` Dave Chinner 2015-02-17 23:32 ` Dave Chinner 2015-02-18 8:25 ` Michal Hocko 2015-02-18 8:25 ` Michal Hocko 2015-02-18 10:48 ` Dave Chinner 2015-02-18 10:48 ` Dave Chinner 2015-02-18 12:16 ` Michal Hocko 2015-02-18 12:16 ` Michal Hocko 2015-02-18 21:31 ` Dave Chinner 2015-02-18 21:31 ` Dave Chinner 2015-02-19 9:40 ` Michal Hocko 2015-02-19 9:40 ` Michal Hocko 2015-02-19 22:03 ` Dave Chinner 2015-02-19 22:03 ` Dave Chinner 2015-02-20 9:27 ` Michal Hocko 2015-02-20 9:27 ` Michal Hocko 2015-02-19 11:01 ` Johannes Weiner 2015-02-19 11:01 ` Johannes Weiner 2015-02-19 12:29 ` Michal Hocko 2015-02-19 12:29 ` Michal Hocko 2015-02-19 12:58 ` Michal Hocko 2015-02-19 12:58 ` Michal Hocko 2015-02-19 15:29 ` Tetsuo Handa 2015-02-19 15:29 ` Tetsuo Handa 2015-02-19 15:29 ` Tetsuo Handa 2015-02-19 21:53 ` Tetsuo Handa 2015-02-19 21:53 ` Tetsuo Handa 2015-02-19 21:53 ` Tetsuo Handa 2015-02-20 9:13 ` Michal Hocko 2015-02-20 9:13 ` Michal Hocko 2015-02-20 13:37 ` Stefan Ring 2015-02-20 13:37 ` Stefan Ring 2015-02-19 13:29 ` Tetsuo Handa 2015-02-19 13:29 ` Tetsuo Handa 2015-02-19 13:29 ` Tetsuo Handa 2015-02-20 9:10 ` Michal Hocko 2015-02-20 9:10 ` Michal Hocko 2015-02-20 12:20 ` Tetsuo Handa 2015-02-20 12:20 ` Tetsuo Handa 2015-02-20 12:20 ` Tetsuo Handa 2015-02-20 12:38 ` Michal Hocko 2015-02-20 12:38 ` Michal Hocko 2015-02-19 21:43 ` Dave Chinner 2015-02-19 21:43 ` Dave Chinner 2015-02-20 12:48 ` Michal Hocko 2015-02-20 12:48 ` Michal Hocko 2015-02-20 23:09 ` Dave Chinner 2015-02-20 23:09 ` Dave Chinner 2015-02-19 10:24 ` Johannes Weiner 2015-02-19 10:24 ` Johannes Weiner 2015-02-19 22:52 ` Dave Chinner 2015-02-19 22:52 ` Dave Chinner 2015-02-20 10:36 ` Tetsuo Handa 2015-02-20 10:36 ` Tetsuo Handa 2015-02-20 23:15 ` Dave Chinner 2015-02-20 23:15 ` Dave Chinner 2015-02-21 3:20 ` Theodore Ts'o 2015-02-21 3:20 ` Theodore Ts'o 2015-02-21 9:19 ` Andrew Morton 2015-02-21 9:19 ` Andrew Morton 2015-02-21 13:48 ` Tetsuo Handa 2015-02-21 13:48 ` Tetsuo Handa 2015-02-21 13:48 ` Tetsuo Handa 2015-02-21 21:38 ` Dave Chinner 2015-02-21 21:38 ` Dave Chinner 2015-02-21 21:38 ` Dave Chinner 2015-02-22 0:20 ` Johannes Weiner 2015-02-22 0:20 ` Johannes Weiner 2015-02-23 10:48 ` Michal Hocko 2015-02-23 10:48 ` Michal Hocko 2015-02-23 10:48 ` Michal Hocko 2015-02-23 11:23 ` Tetsuo Handa 2015-02-23 11:23 ` Tetsuo Handa 2015-02-23 11:23 ` Tetsuo Handa 2015-02-23 21:33 ` David Rientjes 2015-02-23 21:33 ` David Rientjes 2015-02-23 21:33 ` David Rientjes 2015-02-22 14:48 ` __GFP_NOFAIL and oom_killer_disabled? Tetsuo Handa 2015-02-23 10:21 ` Michal Hocko 2015-02-23 13:03 ` Tetsuo Handa 2015-02-24 18:14 ` Michal Hocko 2015-02-25 11:22 ` Tetsuo Handa 2015-02-25 16:02 ` Michal Hocko 2015-02-25 21:48 ` Tetsuo Handa 2015-02-25 21:51 ` Andrew Morton 2015-02-21 12:00 ` How to handle TIF_MEMDIE stalls? Tetsuo Handa 2015-02-21 12:00 ` Tetsuo Handa 2015-02-21 12:00 ` Tetsuo Handa 2015-02-23 10:26 ` Michal Hocko 2015-02-23 10:26 ` Michal Hocko 2015-02-23 10:26 ` Michal Hocko 2015-02-21 11:12 ` Tetsuo Handa 2015-02-21 11:12 ` Tetsuo Handa 2015-02-21 21:48 ` Dave Chinner 2015-02-21 21:48 ` Dave Chinner 2015-02-21 23:52 ` Johannes Weiner 2015-02-21 23:52 ` Johannes Weiner 2015-02-23 0:45 ` Dave Chinner 2015-02-23 0:45 ` Dave Chinner 2015-02-23 1:29 ` Andrew Morton 2015-02-23 1:29 ` Andrew Morton 2015-02-23 7:32 ` Dave Chinner 2015-02-23 7:32 ` Dave Chinner 2015-02-27 18:24 ` Vlastimil Babka 2015-02-27 18:24 ` Vlastimil Babka 2015-02-28 0:03 ` Dave Chinner 2015-02-28 0:03 ` Dave Chinner 2015-02-28 15:17 ` Theodore Ts'o 2015-02-28 15:17 ` Theodore Ts'o 2015-03-02 9:39 ` Vlastimil Babka 2015-03-02 9:39 ` Vlastimil Babka 2015-03-02 22:31 ` Dave Chinner 2015-03-02 22:31 ` Dave Chinner 2015-03-03 9:13 ` Vlastimil Babka 2015-03-03 9:13 ` Vlastimil Babka 2015-03-04 1:33 ` Dave Chinner 2015-03-04 1:33 ` Dave Chinner 2015-03-04 8:50 ` Vlastimil Babka 2015-03-04 8:50 ` Vlastimil Babka 2015-03-04 11:03 ` Dave Chinner 2015-03-04 11:03 ` Dave Chinner 2015-03-07 0:20 ` Johannes Weiner 2015-03-07 0:20 ` Johannes Weiner 2015-03-07 3:43 ` Dave Chinner 2015-03-07 3:43 ` Dave Chinner 2015-03-07 15:08 ` Johannes Weiner 2015-03-07 15:08 ` Johannes Weiner 2015-03-02 20:22 ` Johannes Weiner 2015-03-02 20:22 ` Johannes Weiner 2015-03-02 23:12 ` Dave Chinner 2015-03-02 23:12 ` Dave Chinner 2015-03-03 2:50 ` Johannes Weiner 2015-03-03 2:50 ` Johannes Weiner 2015-03-04 6:52 ` Dave Chinner 2015-03-04 6:52 ` Dave Chinner 2015-03-04 15:04 ` Johannes Weiner 2015-03-04 15:04 ` Johannes Weiner 2015-03-04 17:38 ` Theodore Ts'o 2015-03-04 17:38 ` Theodore Ts'o 2015-03-04 23:17 ` Dave Chinner 2015-03-04 23:17 ` Dave Chinner 2015-02-28 16:29 ` Johannes Weiner 2015-02-28 16:29 ` Johannes Weiner 2015-02-28 16:41 ` Theodore Ts'o 2015-02-28 16:41 ` Theodore Ts'o 2015-02-28 22:15 ` Johannes Weiner 2015-02-28 22:15 ` Johannes Weiner 2015-03-01 11:17 ` Tetsuo Handa 2015-03-01 11:17 ` Tetsuo Handa 2015-03-06 11:53 ` Tetsuo Handa 2015-03-06 11:53 ` Tetsuo Handa 2015-03-01 13:43 ` Theodore Ts'o 2015-03-01 13:43 ` Theodore Ts'o 2015-03-01 16:15 ` Johannes Weiner 2015-03-01 16:15 ` Johannes Weiner 2015-03-01 19:36 ` Theodore Ts'o 2015-03-01 19:36 ` Theodore Ts'o 2015-03-01 20:44 ` Johannes Weiner 2015-03-01 20:44 ` Johannes Weiner 2015-03-01 20:17 ` Johannes Weiner 2015-03-01 20:17 ` Johannes Weiner 2015-03-01 21:48 ` Dave Chinner 2015-03-01 21:48 ` Dave Chinner 2015-03-02 0:17 ` Dave Chinner 2015-03-02 0:17 ` Dave Chinner 2015-03-02 12:46 ` Brian Foster 2015-03-02 12:46 ` Brian Foster 2015-02-28 18:36 ` Vlastimil Babka 2015-02-28 18:36 ` Vlastimil Babka 2015-03-02 15:18 ` Michal Hocko [this message] 2015-03-02 15:18 ` Michal Hocko 2015-03-02 16:05 ` Johannes Weiner 2015-03-02 16:05 ` Johannes Weiner 2015-03-02 17:10 ` Michal Hocko 2015-03-02 17:10 ` Michal Hocko 2015-03-02 17:27 ` Johannes Weiner 2015-03-02 17:27 ` Johannes Weiner 2015-03-02 16:39 ` Theodore Ts'o 2015-03-02 16:39 ` Theodore Ts'o 2015-03-02 16:58 ` Michal Hocko 2015-03-02 16:58 ` Michal Hocko 2015-03-04 12:52 ` Dave Chinner 2015-03-04 12:52 ` Dave Chinner 2015-02-17 14:59 ` Michal Hocko 2015-02-17 14:50 ` Michal Hocko 2015-02-17 14:37 ` Michal Hocko 2015-02-17 14:44 ` Michal Hocko 2015-02-16 11:23 ` Tetsuo Handa 2015-02-16 15:42 ` Johannes Weiner 2015-02-17 11:57 ` Tetsuo Handa 2015-02-17 13:16 ` Johannes Weiner 2015-02-17 16:50 ` Michal Hocko 2015-02-17 23:25 ` Dave Chinner 2015-02-18 8:48 ` Michal Hocko 2015-02-18 11:23 ` Tetsuo Handa 2015-02-18 11:23 ` Tetsuo Handa 2015-02-18 12:29 ` Michal Hocko 2015-02-18 12:29 ` Michal Hocko 2015-02-18 14:06 ` Tetsuo Handa 2015-02-18 14:06 ` Tetsuo Handa 2015-02-18 14:25 ` Michal Hocko 2015-02-19 10:48 ` Tetsuo Handa 2015-02-19 10:48 ` Tetsuo Handa 2015-02-20 8:26 ` Michal Hocko 2015-02-20 8:26 ` Michal Hocko 2015-02-23 22:08 ` David Rientjes 2015-02-24 11:20 ` Tetsuo Handa 2015-02-24 15:20 ` Theodore Ts'o 2015-02-24 21:02 ` Dave Chinner 2015-02-25 14:31 ` Tetsuo Handa 2015-02-27 7:39 ` Dave Chinner 2015-02-27 12:42 ` Tetsuo Handa 2015-02-27 13:12 ` Dave Chinner 2015-03-04 12:41 ` Tetsuo Handa 2015-03-04 13:25 ` Dave Chinner 2015-03-04 14:11 ` Tetsuo Handa 2015-03-05 1:36 ` Dave Chinner 2015-02-17 16:33 ` Michal Hocko 2014-12-29 17:40 ` [PATCH] mm: get rid of radix tree gfp mask for pagecache_get_page (was: Re: How to handle TIF_MEMDIE stalls?) Michal Hocko 2014-12-29 18:45 ` Linus Torvalds 2014-12-29 19:33 ` Michal Hocko 2014-12-30 13:42 ` Michal Hocko 2014-12-30 21:45 ` Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150302151832.GE26334@dhcp22.suse.cz \ --to=mhocko@suse.cz \ --cc=akpm@linux-foundation.org \ --cc=david@fromorbit.com \ --cc=dchinner@redhat.com \ --cc=hannes@cmpxchg.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=oleg@redhat.com \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ --cc=rientjes@google.com \ --cc=torvalds@linux-foundation.org \ --cc=xfs@oss.sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.