From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752073AbcEJLx0 (ORCPT ); Tue, 10 May 2016 07:53:26 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:33417 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751492AbcEJLxX (ORCPT ); Tue, 10 May 2016 07:53:23 -0400 Date: Tue, 10 May 2016 13:53:20 +0200 From: Michal Hocko To: Tetsuo Handa Cc: LKML , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , "H. Peter Anvin" , "David S. Miller" , Tony Luck , Andrew Morton , Chris Zankel , Max Filippov Subject: Re: [PATCH 03/11] locking, rwsem: introduce basis for down_write_killable Message-ID: <20160510115320.GJ23576@dhcp22.suse.cz> References: <1459508695-14915-1-git-send-email-mhocko@kernel.org> <1459508695-14915-4-git-send-email-mhocko@kernel.org> <8bd03bdc-0373-a3bb-da12-045322efb797@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8bd03bdc-0373-a3bb-da12-045322efb797@I-love.SAKURA.ne.jp> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 10-05-16 19:43:20, Tetsuo Handa wrote: > I hit "allowing the OOM killer to select the same thread again" problem > ( http://lkml.kernel.org/r/20160408113425.GF29820@dhcp22.suse.cz ), but > I think that there is a bug in down_write_killable() series (at least > "locking, rwsem: introduce basis for down_write_killable" patch). > > Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20160510-sem.txt.xz . [...] > 2 threads (PID: 1314 and 1443) are sleeping at rwsem_down_read_failed() > but no thread is sleeping at rwsem_down_write_failed_killable(). > If there is no thread waiting for write lock, threads waiting for read > lock must be able to run. This suggests that one of threads which was > waiting for write lock forgot to wake up reader threads. Or that the write lock holder is still keeping the lock held. I do not see such a process in your list though. Is it possible that the debug_show_all_locks would just miss it as it is not sleeping? > Looking at rwsem_down_read_failed(), reader threads waiting for the > writer thread to release the lock are waiting on sem->wait_list list. > Looking at __rwsem_down_write_failed_common(), when the writer thread > escaped the > > /* Block until there are no active lockers. */ > do { > if (signal_pending_state(state, current)) { > raw_spin_lock_irq(&sem->wait_lock); > ret = ERR_PTR(-EINTR); > goto out; > } > schedule(); > set_current_state(state); > } while ((count = sem->count) & RWSEM_ACTIVE_MASK); > > loop due to SIGKILL, I think that the writer thread needs to check for > remaining threads on sem->wait_list list and wake up reader threads > before rwsem_down_write_failed_killable() returns -EINTR. I am not sure I understand. The rwsem counter is not write locked while the thread is sleeping and when we fail on the signal pending so readers should be able to proceed, no? Or are you suggesting that the failure path should call rwsem_wake? I do not see __mutex_lock_common for killable wait doing something like that and rwsem_wake is explicitly documented that it is called after the lock state has been updated already. Now I might be missing something subtle here but I guess the code is correct and it is more likely that the holder of the lock wasn't killed but it is rather holding the lock and doing something else. -- Michal Hocko SUSE Labs