From: Ingo Molnar <mingo@kernel.org> To: Michal Hocko <mhocko@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, "David S. Miller" <davem@davemloft.net>, Tony Luck <tony.luck@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Chris Zankel <chris@zankel.net>, Max Filippov <jcmvbkbc@gmail.com>, x86@kernel.org, linux-alpha@vger.kernel.org, linux-ia64@vger.kernel.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <a.p.zijlstra@chello.nl> Subject: Re: [RFC 0/12] introduce down_write_killable for rw_semaphore Date: Wed, 09 Mar 2016 13:43:39 +0000 [thread overview] Message-ID: <20160309134339.GA20911@gmail.com> (raw) In-Reply-To: <20160309132855.GI27018@dhcp22.suse.cz> * Michal Hocko <mhocko@kernel.org> wrote: > On Wed 09-03-16 14:17:10, Ingo Molnar wrote: > > > > * Michal Hocko <mhocko@kernel.org> wrote: > > > > > > [...] this is a follow up work for oom_reaper [1]. As the async OOM killing > > > > depends on oom_sem for read we would really appreciate if a holder for write > > > > stood in the way. This patchset is changing many of down_write calls to be > > > > killable to help those cases when the writer is blocked and waiting for > > > > readers to release the lock and so help __oom_reap_task to process the oom > > > > victim. > > > > > > > > there seems to be a misunderstanding: if a writer is blocked waiting for > > > > readers then no new readers are allowed - the writer will get its turn the > > > > moment all existing readers drop the lock. > > > > > > Readers might be blocked e.g. on the memory allocation which cannot proceed due > > > to OOM. Such a reader might be operating on a remote mm. > > > > Doing complex allocations with the mm locked looks fragile no matter what: we > > should add debugging code that warns if allocations are done with a remote mm > > locked. (it should be trivial) > > No matter how fragile is that it is not something non-existent. Just > have a look at use_mm for example. We definitely do not want to warn > about those, right? Sure we care about eliminating fragility, and usage does not seem to be widespread at all: triton:~/tip> git grep -w use_mm drivers/staging/rdma/hfi1/user_sdma.c: use_mm(req->pq->user_mm); drivers/usb/gadget/function/f_fs.c: use_mm(io_data->mm); drivers/usb/gadget/legacy/inode.c: use_mm(mm); drivers/vhost/vhost.c: use_mm(dev->mm); I think we also want to keep our general flexibility wrt. eventually turning the mmap_sem into a spinlock ... > > > I am not against interruptible variant as well but I suspect that some paths > > > are not expected to return EINTR. I haven't checked them for this but > > > killable is sufficient for the problem I am trying to solve. That problem is > > > real while latencies do not seem to be that eminent. > > > > If they don't expect EINTR then they sure don't expect SIGKILL either! > > Why? Each syscall already is killable as the task might be killed by the OOM > killer. Not all syscalls are interruptible - for example sys_sync() isn't: SYSCALL_DEFINE0(sync) { int nowait = 0, wait = 1; wakeup_flusher_threads(0, WB_REASON_SYNC); iterate_supers(sync_inodes_one_sb, NULL); iterate_supers(sync_fs_one_sb, &nowait); iterate_supers(sync_fs_one_sb, &wait); iterate_bdevs(fdatawrite_one_bdev, NULL); iterate_bdevs(fdatawait_one_bdev, NULL); if (unlikely(laptop_mode)) laptop_sync_completion(); return 0; } > > There's a (very) low number of system calls that are not interruptible, but > > the vast majority is. > > That might be true. I just fail to see how this is related to the > particular problem I am trying to solve. As I've said those callsites > which cause problems with latencies can be later converted to > interruptible waiting trivially. So my problem as I see it is the following: you are adding a rare API to an already complex locking interface, further complicating already complicated MM code paths in various ways. Only to help a case that is a third type of rare: OOM-kill. That's a surefire whack-a-mole nest of bugs, if I've ever seen one. What I am suggesting instead is a slight modification of the concept: to re-phrase the problem set and think in broader terms of interruptability: make certain MM operations, especially ones which tend to hinder OOM-kill latencies, more interruptible - which implicitly also makes them more OOM-killable. That's a win-win as I see it: as both your usecase and a lot of other usecases will be improved - and it will also be tested a lot more than any OOM-kill path will be tested. I might be wrong in the end, but your counterarguments were not convincing so far (to me). Thanks, Ingo
WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org> To: Michal Hocko <mhocko@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Thomas Gleixner <tglx@linutronix.de>, "H. Peter Anvin" <hpa@zytor.com>, "David S. Miller" <davem@davemloft.net>, Tony Luck <tony.luck@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Chris Zankel <chris@zankel.net>, Max Filippov <jcmvbkbc@gmail.com>, x86@kernel.org, linux-alpha@vger.kernel.org, linux-ia64@vger.kernel.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-arch@vger.kernel.org, Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <a.p.zijlstra@chello.nl> Subject: Re: [RFC 0/12] introduce down_write_killable for rw_semaphore Date: Wed, 9 Mar 2016 14:43:39 +0100 [thread overview] Message-ID: <20160309134339.GA20911@gmail.com> (raw) In-Reply-To: <20160309132855.GI27018@dhcp22.suse.cz> * Michal Hocko <mhocko@kernel.org> wrote: > On Wed 09-03-16 14:17:10, Ingo Molnar wrote: > > > > * Michal Hocko <mhocko@kernel.org> wrote: > > > > > > [...] this is a follow up work for oom_reaper [1]. As the async OOM killing > > > > depends on oom_sem for read we would really appreciate if a holder for write > > > > stood in the way. This patchset is changing many of down_write calls to be > > > > killable to help those cases when the writer is blocked and waiting for > > > > readers to release the lock and so help __oom_reap_task to process the oom > > > > victim. > > > > > > > > there seems to be a misunderstanding: if a writer is blocked waiting for > > > > readers then no new readers are allowed - the writer will get its turn the > > > > moment all existing readers drop the lock. > > > > > > Readers might be blocked e.g. on the memory allocation which cannot proceed due > > > to OOM. Such a reader might be operating on a remote mm. > > > > Doing complex allocations with the mm locked looks fragile no matter what: we > > should add debugging code that warns if allocations are done with a remote mm > > locked. (it should be trivial) > > No matter how fragile is that it is not something non-existent. Just > have a look at use_mm for example. We definitely do not want to warn > about those, right? Sure we care about eliminating fragility, and usage does not seem to be widespread at all: triton:~/tip> git grep -w use_mm drivers/staging/rdma/hfi1/user_sdma.c: use_mm(req->pq->user_mm); drivers/usb/gadget/function/f_fs.c: use_mm(io_data->mm); drivers/usb/gadget/legacy/inode.c: use_mm(mm); drivers/vhost/vhost.c: use_mm(dev->mm); I think we also want to keep our general flexibility wrt. eventually turning the mmap_sem into a spinlock ... > > > I am not against interruptible variant as well but I suspect that some paths > > > are not expected to return EINTR. I haven't checked them for this but > > > killable is sufficient for the problem I am trying to solve. That problem is > > > real while latencies do not seem to be that eminent. > > > > If they don't expect EINTR then they sure don't expect SIGKILL either! > > Why? Each syscall already is killable as the task might be killed by the OOM > killer. Not all syscalls are interruptible - for example sys_sync() isn't: SYSCALL_DEFINE0(sync) { int nowait = 0, wait = 1; wakeup_flusher_threads(0, WB_REASON_SYNC); iterate_supers(sync_inodes_one_sb, NULL); iterate_supers(sync_fs_one_sb, &nowait); iterate_supers(sync_fs_one_sb, &wait); iterate_bdevs(fdatawrite_one_bdev, NULL); iterate_bdevs(fdatawait_one_bdev, NULL); if (unlikely(laptop_mode)) laptop_sync_completion(); return 0; } > > There's a (very) low number of system calls that are not interruptible, but > > the vast majority is. > > That might be true. I just fail to see how this is related to the > particular problem I am trying to solve. As I've said those callsites > which cause problems with latencies can be later converted to > interruptible waiting trivially. So my problem as I see it is the following: you are adding a rare API to an already complex locking interface, further complicating already complicated MM code paths in various ways. Only to help a case that is a third type of rare: OOM-kill. That's a surefire whack-a-mole nest of bugs, if I've ever seen one. What I am suggesting instead is a slight modification of the concept: to re-phrase the problem set and think in broader terms of interruptability: make certain MM operations, especially ones which tend to hinder OOM-kill latencies, more interruptible - which implicitly also makes them more OOM-killable. That's a win-win as I see it: as both your usecase and a lot of other usecases will be improved - and it will also be tested a lot more than any OOM-kill path will be tested. I might be wrong in the end, but your counterarguments were not convincing so far (to me). Thanks, Ingo
next prev parent reply other threads:[~2016-03-09 13:43 UTC|newest] Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-02-02 20:19 [RFC 0/12] introduce down_write_killable for rw_semaphore Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 01/12] locking, rwsem: get rid of __down_write_nested Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 02/12] locking, rwsem: drop explicit memory barriers Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 03/12] locking, rwsem: introduce basis for down_write_killable Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 04/12] alpha, rwsem: provide __down_write_killable Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 05/12] ia64, " Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 06/12] s390, " Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 07/12] sh, " Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-03 11:19 ` Sergei Shtylyov 2016-02-03 11:19 ` Sergei Shtylyov 2016-02-03 12:11 ` Michal Hocko 2016-02-03 12:11 ` Michal Hocko 2016-02-02 20:19 ` [RFC 08/12] sparc, " Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 09/12] xtensa, " Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-02 20:19 ` [RFC 10/12] x86, rwsem: simplify __down_write Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-03 8:10 ` Ingo Molnar 2016-02-03 8:10 ` Ingo Molnar 2016-02-03 12:10 ` Michal Hocko 2016-02-03 12:10 ` Michal Hocko 2016-06-03 16:13 ` Peter Zijlstra 2016-06-03 16:13 ` Peter Zijlstra 2016-06-03 22:34 ` Peter Zijlstra 2016-06-03 22:34 ` Peter Zijlstra 2016-06-09 14:40 ` David Howells 2016-06-09 14:40 ` David Howells 2016-06-09 17:36 ` Peter Zijlstra 2016-06-09 17:36 ` Peter Zijlstra 2016-06-10 16:39 ` Paul E. McKenney 2016-06-10 16:39 ` Paul E. McKenney 2016-02-02 20:19 ` [RFC 11/12] x86, rwsem: provide __down_write_killable Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-17 16:41 ` [RFC 11/12 v1] " Michal Hocko 2016-02-17 16:41 ` Michal Hocko 2016-02-17 16:41 ` Michal Hocko 2016-02-17 16:41 ` Michal Hocko 2016-02-02 20:19 ` [RFC 12/12] locking, rwsem: provide down_write_killable Michal Hocko 2016-02-02 20:19 ` Michal Hocko 2016-02-19 12:15 ` [RFC 0/12] introduce down_write_killable for rw_semaphore Michal Hocko 2016-02-19 12:15 ` Michal Hocko 2016-03-09 12:18 ` Ingo Molnar 2016-03-09 12:18 ` Ingo Molnar 2016-03-09 12:56 ` Michal Hocko 2016-03-09 12:56 ` Michal Hocko 2016-03-09 13:17 ` Ingo Molnar 2016-03-09 13:17 ` Ingo Molnar 2016-03-09 13:28 ` Michal Hocko 2016-03-09 13:28 ` Michal Hocko 2016-03-09 13:43 ` Ingo Molnar [this message] 2016-03-09 13:43 ` Ingo Molnar 2016-03-09 14:41 ` Michal Hocko 2016-03-09 14:41 ` Michal Hocko 2016-03-10 10:24 ` Ingo Molnar 2016-03-10 10:24 ` Ingo Molnar
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160309134339.GA20911@gmail.com \ --to=mingo@kernel.org \ --cc=a.p.zijlstra@chello.nl \ --cc=akpm@linux-foundation.org \ --cc=chris@zankel.net \ --cc=davem@davemloft.net \ --cc=hpa@zytor.com \ --cc=jcmvbkbc@gmail.com \ --cc=linux-alpha@vger.kernel.org \ --cc=linux-arch@vger.kernel.org \ --cc=linux-ia64@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-s390@vger.kernel.org \ --cc=linux-sh@vger.kernel.org \ --cc=linux-xtensa@linux-xtensa.org \ --cc=mhocko@kernel.org \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=sparclinux@vger.kernel.org \ --cc=tglx@linutronix.de \ --cc=tony.luck@intel.com \ --cc=torvalds@linux-foundation.org \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.