From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753318AbbJaLeI (ORCPT ); Sat, 31 Oct 2015 07:34:08 -0400 Received: from mail-qg0-f41.google.com ([209.85.192.41]:35784 "EHLO mail-qg0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752934AbbJaLeF (ORCPT ); Sat, 31 Oct 2015 07:34:05 -0400 Date: Sat, 31 Oct 2015 07:34:00 -0400 From: Jeff Layton To: Tejun Heo Cc: linux-kernel@vger.kernel.org, bfields@fieldses.org, Michael Skralivetsky , Chris Worley , Trond Myklebust , Lai Jiangshan Subject: Re: timer code oops when calling mod_delayed_work Message-ID: <20151031073400.2cf05d77@tlielax.poochiereds.net> In-Reply-To: <20151031020012.GH3582@mtj.duckdns.org> References: <20151029103113.2f893924@tlielax.poochiereds.net> <20151029135836.02ad9000@synchrony.poochiereds.net> <20151031020012.GH3582@mtj.duckdns.org> X-Mailer: Claws Mail 3.12.0 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 31 Oct 2015 11:00:12 +0900 Tejun Heo wrote: > (cc'ing Lai) > > Hello, Jeff. > > On Thu, Oct 29, 2015 at 01:58:36PM -0400, Jeff Layton wrote: > > crash> p cache_cleaner > > cache_cleaner = $12 = { > > work = { > > data = { > > counter = 0xfffffffe1 > > If I'm not mistaken, PENDING, flush color 14, OFFQ and POOL_NONE. > > > }, > > entry = { > > next = 0xffffffffa03623c8 , > > prev = 0xffffffffa03623c8 > > Empty entry. > > > }, > > func = 0xffffffffa03333c0 > > }, > > timer = { > > entry = { > > next = 0x0, > > pprev = 0xffff88085fd0eaf8 > > }, > > expires = 0x100021e99, > > function = 0xffffffff810b66a0 , > > data = 0xffffffffa03623c0, > > flags = 0x200014, > > slack = 0xffffffff, > > start_pid = 0x0, > > start_site = 0x0, > > start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" > > }, > > wq = 0xffff88085f48fc00, > > cpu = 0x14 > > } > > > > So the PENDING bit is set (lowest bit in data.counter), and timer->entry.pprev > > pprev pointer is not NULL (so timer_pending is true). I also see that > > there are several nfsd threads running the shrinker at the same time. > > > > There is one potential problem that I see, but I'd appreciate someone > > sanity checking me on this. Here is mod_delayed_work_on: > ... > > ...and here is the beginning of try_to_grab_pending: > > > > ------------------[snip]------------------------ > > /* try to steal the timer if it exists */ > > if (is_dwork) { > > struct delayed_work *dwork = to_delayed_work(work); > > > > /* > > * dwork->timer is irqsafe. If del_timer() fails, it's > > * guaranteed that the timer is not queued anywhere and not > > * running on the local CPU. > > */ > > if (likely(del_timer(&dwork->timer))) > > return 1; > > } > > > > /* try to claim PENDING the normal way */ > > if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) > > return 0; > > ------------------[snip]------------------------ > > > > > > ...so if del_timer returns true, we'll return 1 from > > try_to_grab_pending without actually setting the > > WORK_STRUCT_PENDING_BIT, and then will end up calling > > __queue_delayed_work. > > > > That seems wrong to me -- shouldn't we be ensuring that that bit is set > > when returning 1 from try_to_grab_pending to guard against concurrent > > queue_delayed_work_on calls? > > But if try_to_grab_pending() succeeded at stealing dwork->timer, it's > known that the PENDING bit must already be set. IOW, the bit is > stolen together with the timer. > > Heh, this one is tricky. Yeah, try_to_grab_pending() missing PENDING > would explain the failure but I can't see how it'd leak at the moment. > Thanks Tejun. Yeah, I realized that after sending the response above. If you successfully delete the timer the timer then the PENDING bit should already be set. Might be worth throwing in something like this, just before the return 1: WARN_ON(!test_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) ...but I doubt it would fire. I think it's likely that the bug is elsewhere. The other thing is that we've had this code in place for a couple of years now, and this is the first time I've seen an oops like this. I suspect that this may be a recent regression, but I don't know that for sure. I have asked Chris and Michael to see if they can bisect it down, but it may be a bit before they can get that done. Any insight you might have in the meantime would helpful. -- Jeff Layton