From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@suse.cz, akpm@linux-foundation.org Cc: hannes@cmpxchg.org, david@fromorbit.com, mgorman@suse.de, riel@redhat.com, fengguang.wu@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/2] Move away from non-failing small allocations Date: Tue, 17 Mar 2015 23:06:34 +0900 [thread overview] Message-ID: <201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20150317090738.GB28112@dhcp22.suse.cz> Michal Hocko wrote: > On Mon 16-03-15 15:38:43, Andrew Morton wrote: > > Realistically, I don't think this overall effort will be successful - > > we'll add the knob, it won't get enough testing and any attempt to > > alter the default will be us deliberately destabilizing the kernel > > without knowing how badly :( > > Without the knob we do not allow users to test this at all though and > the transition will _never_ happen. Which is IMHO bad. > Even with the knob, quite little users will test this. The consequence is likely that end users rush into customer support center about obscure bugs. I'm working at a support center, and such bugs are really annoying. > > I wonder if we can alter the behaviour only for filesystem code, so we > > constrain the new behaviour just to that code where we're having > > problems. Most/all fs code goes via vfs methods so there's a reasonably > > small set of places where we can call > > We are seeing issues with the fs code now because the test cases which > led to the current discussion exercise FS code. The code which does > lock(); kmalloc(GFP_KERNEL) is not reduced there though. I am pretty sure > we can find other subsystems if we try hard enough. I'm expecting for patches which avoids deadlock by lock(); kmalloc(GFP_KERNEL). > > static inline void enter_fs_code(struct super_block *sb) > > { > > if (sb->my_small_allocations_can_fail) > > current->small_allocations_can_fail++; > > } > > > > that way (or something similar) we can select the behaviour on a per-fs > > basis and the rest of the kernel remains unaffected. Other subsystems > > can opt in as well. > > This is basically leading to GFP_MAYFAIL which is completely backwards > (the hard requirement should be an exception not a default rule). > I really do not want to end up with stuffing random may_fail annotations > all over the kernel. > I wish that GFP_NOFS / GFP_NOIO regions are annotated with static inline void enter_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code++; #endif } static inline void leave_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code--; #endif } static inline void enter_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code++; #endif } static inline void leave_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code--; #endif } so that inappropriate GFP_KERNEL usage inside GFP_NOFS region are catchable by doing struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; +#ifdef CONFIG_DEBUG_GFP_FLAGS + WARN_ON(current->in_fs_code & (gfp_mask & __GFP_FS)); + WARN_ON(current->in_io_code & (gfp_mask & __GFP_IO)); +#endif lockdep_trace_alloc(gfp_mask); . It is difficult for non-fs developers to determine whether they need to use GFP_NOFS than GFP_KERNEL in their code. An example is seen at http://marc.info/?l=linux-security-module&m=138556479607024&w=2 . Moreover, I don't know how GFP flags are managed when stacked like "a swap file on ext4 on top of LVM (with snapshots) on a RAID array connected over iSCSI" (quoted from comments on Jon's writeup), but I wish that the distinction between GFP_KERNEL / GFP_NOFS / GFP_NOIO are removed from memory allocating function callers by doing static inline void enter_fs_code(void) { current->in_fs_code++; } static inline void leave_fs_code(void) { current->in_fs_code--; } static inline void enter_io_code(void) { current->in_io_code++; } static inline void leave_io_code(void) { current->in_io_code--; } struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; + if (current->in_fs_code) + gfp_mask &= ~__GFP_FS; + if (current->in_io_code) + gfp_mask &= ~__GFP_IO; lockdep_trace_alloc(gfp_mask); so that GFP flags passed to memory allocations involved by stacking will be appropriately masked.
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> To: mhocko@suse.cz, akpm@linux-foundation.org Cc: hannes@cmpxchg.org, david@fromorbit.com, mgorman@suse.de, riel@redhat.com, fengguang.wu@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/2] Move away from non-failing small allocations Date: Tue, 17 Mar 2015 23:06:34 +0900 [thread overview] Message-ID: <201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp> (raw) In-Reply-To: <20150317090738.GB28112@dhcp22.suse.cz> Michal Hocko wrote: > On Mon 16-03-15 15:38:43, Andrew Morton wrote: > > Realistically, I don't think this overall effort will be successful - > > we'll add the knob, it won't get enough testing and any attempt to > > alter the default will be us deliberately destabilizing the kernel > > without knowing how badly :( > > Without the knob we do not allow users to test this at all though and > the transition will _never_ happen. Which is IMHO bad. > Even with the knob, quite little users will test this. The consequence is likely that end users rush into customer support center about obscure bugs. I'm working at a support center, and such bugs are really annoying. > > I wonder if we can alter the behaviour only for filesystem code, so we > > constrain the new behaviour just to that code where we're having > > problems. Most/all fs code goes via vfs methods so there's a reasonably > > small set of places where we can call > > We are seeing issues with the fs code now because the test cases which > led to the current discussion exercise FS code. The code which does > lock(); kmalloc(GFP_KERNEL) is not reduced there though. I am pretty sure > we can find other subsystems if we try hard enough. I'm expecting for patches which avoids deadlock by lock(); kmalloc(GFP_KERNEL). > > static inline void enter_fs_code(struct super_block *sb) > > { > > if (sb->my_small_allocations_can_fail) > > current->small_allocations_can_fail++; > > } > > > > that way (or something similar) we can select the behaviour on a per-fs > > basis and the rest of the kernel remains unaffected. Other subsystems > > can opt in as well. > > This is basically leading to GFP_MAYFAIL which is completely backwards > (the hard requirement should be an exception not a default rule). > I really do not want to end up with stuffing random may_fail annotations > all over the kernel. > I wish that GFP_NOFS / GFP_NOIO regions are annotated with static inline void enter_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code++; #endif } static inline void leave_fs_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_fs_code--; #endif } static inline void enter_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code++; #endif } static inline void leave_io_code(void) { #ifdef CONFIG_DEBUG_GFP_FLAGS current->in_io_code--; #endif } so that inappropriate GFP_KERNEL usage inside GFP_NOFS region are catchable by doing struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; +#ifdef CONFIG_DEBUG_GFP_FLAGS + WARN_ON(current->in_fs_code & (gfp_mask & __GFP_FS)); + WARN_ON(current->in_io_code & (gfp_mask & __GFP_IO)); +#endif lockdep_trace_alloc(gfp_mask); . It is difficult for non-fs developers to determine whether they need to use GFP_NOFS than GFP_KERNEL in their code. An example is seen at http://marc.info/?l=linux-security-module&m=138556479607024&w=2 . Moreover, I don't know how GFP flags are managed when stacked like "a swap file on ext4 on top of LVM (with snapshots) on a RAID array connected over iSCSI" (quoted from comments on Jon's writeup), but I wish that the distinction between GFP_KERNEL / GFP_NOFS / GFP_NOIO are removed from memory allocating function callers by doing static inline void enter_fs_code(void) { current->in_fs_code++; } static inline void leave_fs_code(void) { current->in_fs_code--; } static inline void enter_io_code(void) { current->in_io_code++; } static inline void leave_io_code(void) { current->in_io_code--; } struct page * __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, nodemask_t *nodemask) { struct zoneref *preferred_zoneref; struct page *page = NULL; unsigned int cpuset_mems_cookie; int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ struct alloc_context ac = { .high_zoneidx = gfp_zone(gfp_mask), .nodemask = nodemask, .migratetype = gfpflags_to_migratetype(gfp_mask), }; gfp_mask &= gfp_allowed_mask; + if (current->in_fs_code) + gfp_mask &= ~__GFP_FS; + if (current->in_io_code) + gfp_mask &= ~__GFP_IO; lockdep_trace_alloc(gfp_mask); so that GFP flags passed to memory allocations involved by stacking will be appropriately masked. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-17 14:06 UTC|newest] Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-03-11 20:54 [PATCH 0/2] Move away from non-failing small allocations Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-11 20:54 ` [PATCH 1/2] mm: Allow small allocations to fail Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-12 12:54 ` Tetsuo Handa 2015-03-12 12:54 ` Tetsuo Handa 2015-03-12 13:12 ` Michal Hocko 2015-03-12 13:12 ` Michal Hocko 2015-03-15 5:43 ` Tetsuo Handa 2015-03-15 5:43 ` Tetsuo Handa 2015-03-15 12:13 ` Michal Hocko 2015-03-15 12:13 ` Michal Hocko 2015-03-15 13:06 ` Tetsuo Handa 2015-03-15 13:06 ` Tetsuo Handa 2015-03-16 7:46 ` [PATCH 1/2 v2] " Michal Hocko 2015-03-16 7:46 ` Michal Hocko 2015-03-16 21:11 ` Johannes Weiner 2015-03-16 21:11 ` Johannes Weiner 2015-03-17 10:25 ` Michal Hocko 2015-03-17 10:25 ` Michal Hocko 2015-03-17 13:29 ` Johannes Weiner 2015-03-17 13:29 ` Johannes Weiner 2015-03-17 14:17 ` Michal Hocko 2015-03-17 14:17 ` Michal Hocko 2015-03-17 17:26 ` Johannes Weiner 2015-03-17 17:26 ` Johannes Weiner 2015-03-17 19:41 ` Michal Hocko 2015-03-17 19:41 ` Michal Hocko 2015-03-18 9:10 ` Vlastimil Babka 2015-03-18 9:10 ` Vlastimil Babka 2015-03-18 12:04 ` Michal Hocko 2015-03-18 12:04 ` Michal Hocko 2015-03-18 12:36 ` Tetsuo Handa 2015-03-18 12:36 ` Tetsuo Handa 2015-03-18 11:35 ` Tetsuo Handa 2015-03-18 11:35 ` Tetsuo Handa 2015-03-17 11:13 ` Tetsuo Handa 2015-03-17 11:13 ` Tetsuo Handa 2015-03-17 13:15 ` Michal Hocko 2015-03-17 13:15 ` Michal Hocko 2015-03-18 11:33 ` Tetsuo Handa 2015-03-18 11:33 ` Tetsuo Handa 2015-03-18 12:23 ` Michal Hocko 2015-03-18 12:23 ` Michal Hocko 2015-03-19 11:03 ` Tetsuo Handa 2015-03-19 11:03 ` Tetsuo Handa 2015-03-11 20:54 ` [PATCH 2/2] mmotm: Enable small allocation " Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-11 20:54 ` Michal Hocko 2015-03-11 22:36 ` [PATCH 0/2] Move away from non-failing small allocations Sasha Levin 2015-03-11 22:36 ` Sasha Levin 2015-03-11 22:36 ` Sasha Levin 2015-03-16 22:38 ` Andrew Morton 2015-03-16 22:38 ` Andrew Morton 2015-03-16 22:38 ` Andrew Morton 2015-03-17 9:07 ` Michal Hocko 2015-03-17 9:07 ` Michal Hocko 2015-03-17 14:06 ` Tetsuo Handa [this message] 2015-03-17 14:06 ` Tetsuo Handa 2015-04-02 11:53 ` Tetsuo Handa 2015-04-02 11:53 ` Tetsuo Handa
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=201503172305.DIH52162.FOFMFOVJHLOtQS@I-love.SAKURA.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=akpm@linux-foundation.org \ --cc=david@fromorbit.com \ --cc=fengguang.wu@intel.com \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@suse.cz \ --cc=riel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.