From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756009AbdDGNGN (ORCPT ); Fri, 7 Apr 2017 09:06:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46426 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753282AbdDGNGD (ORCPT ); Fri, 7 Apr 2017 09:06:03 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A1DEFC05AA4F Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=aarcange@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A1DEFC05AA4F Date: Fri, 7 Apr 2017 15:06:00 +0200 From: Andrea Arcangeli To: Chris Wilson , Martin Kepplinger , Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker Message-ID: <20170407130600.GA5035@redhat.com> References: <87pogtplxr.fsf@intel.com> <20170406232347.988-1-aarcange@redhat.com> <20170406232347.988-3-aarcange@redhat.com> <20170407100211.GG10496@nuc-i3427.alporthouse.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170407100211.GG10496@nuc-i3427.alporthouse.com> User-Agent: Mutt/1.8.0 (2017-02-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 07 Apr 2017 13:06:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 07, 2017 at 11:02:11AM +0100, Chris Wilson wrote: > On Fri, Apr 07, 2017 at 01:23:44AM +0200, Andrea Arcangeli wrote: > > Waiting a RCU grace period only guarantees the work gets queued, but > > until after the queued workqueue returns, there's no guarantee the > > memory was actually freed. So flush the work to provide better > > guarantees to the reclaim code in addition of waiting a RCU grace > > period to pass. > > We are not allowed to call flush_work() from the shrinker, the workqueue > doesn't have and can't have the right reclaim flags. I figured the flush_work had to be conditional to "unlock" being true too in the i915 shrinker (not only synchronize_rcu_expedited()), and I already fixed that bit, but I didn't think it would be a problem to wait for the workqueue as long as reclaim didn't recurse on the struct_mutex (it is a problem if unlock is false of course as we would be back to square one). I didn't get further hangs and I assume I've been running a couple of synchronize_rcu_expedited() and flush_work (I should add dynamic tracing to be sure). Also note, I didn't get any lockdep warning when I reproduced the workqueue hang in 4.11-rc5 so at least as far as lockdep is concerned there's no problem to call synchronize_rcu_expedited and it couldn't notice we were holding the struct_mutex while waiting for the new workqueue to run. Also note recursing on the lock (unlock false case) is something nothing else does, I'm not sure if it's worth the risk and if you shouldn't just call mutex_trylock in the shrinker instead of mutex_trylock_recursive. One thing was to recurse on the lock internally in the same context, but recursing through the whole reclaim is more dubious as safe. You could start dropping objects and wiping vmas and stuff in the middle of some kmalloc/alloc_pages that doesn't expect it and then crash for other reasons. So this reclaim recursion model of the shinker is quite unique and quite challenging to proof as safe if you keep using mutex_trylock_recursive in i915_gem_shrinker_scan. Lock recursion in all other places could be dropped without runtime downsides, the only place mutex_trylock_recursive makes a design difference and makes sense to be used is in i915_gem_shrinker_scan, the rest are implementation issues not fundamental shrinker design and it'd be nice if those other mutex_trylock_recursive would all be removed and the only one that is left is in i915_gem_shrinker_scan and nowhere else (or to drop it also from i915_gem_shrinker_scan). mutex_trylock_recursive() should also be patched to use READ_ONCE(__mutex_owner(lock)) because currently it breaks C. In the whole kernel i915 and msm drm are the only two users of such function in fact. Another thing is what value return from i915_gem_shrinker_scan when unlock is false, and we can't possibly wait for the memory to be freed let alone for a rcu grace period. For various reasons I think it's safer to return the current "free" even if we could also return "0" in such case. There are different tradeoffs, returning "free" is less likely to trigger an early OOM as the VM thinks it's still making progress and in fact it will get more free memory shortly, while returning SHRINK_STOP would also be an option and it would insist more on the other slabs so it would be more reliable at freeing memory timely, but it would be more at risk of early OOM. I think returning "free" is the better tradeoff of the two, but I suggest to add a comment as it's not exactly obvious what is better. Thanks, Andrea From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker Date: Fri, 7 Apr 2017 15:06:00 +0200 Message-ID: <20170407130600.GA5035@redhat.com> References: <87pogtplxr.fsf@intel.com> <20170406232347.988-1-aarcange@redhat.com> <20170406232347.988-3-aarcange@redhat.com> <20170407100211.GG10496@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20170407100211.GG10496@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , Martin Kepplinger , Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org T24gRnJpLCBBcHIgMDcsIDIwMTcgYXQgMTE6MDI6MTFBTSArMDEwMCwgQ2hyaXMgV2lsc29uIHdy b3RlOgo+IE9uIEZyaSwgQXByIDA3LCAyMDE3IGF0IDAxOjIzOjQ0QU0gKzAyMDAsIEFuZHJlYSBB cmNhbmdlbGkgd3JvdGU6Cj4gPiBXYWl0aW5nIGEgUkNVIGdyYWNlIHBlcmlvZCBvbmx5IGd1YXJh bnRlZXMgdGhlIHdvcmsgZ2V0cyBxdWV1ZWQsIGJ1dAo+ID4gdW50aWwgYWZ0ZXIgdGhlIHF1ZXVl ZCB3b3JrcXVldWUgcmV0dXJucywgdGhlcmUncyBubyBndWFyYW50ZWUgdGhlCj4gPiBtZW1vcnkg d2FzIGFjdHVhbGx5IGZyZWVkLiBTbyBmbHVzaCB0aGUgd29yayB0byBwcm92aWRlIGJldHRlcgo+ ID4gZ3VhcmFudGVlcyB0byB0aGUgcmVjbGFpbSBjb2RlIGluIGFkZGl0aW9uIG9mIHdhaXRpbmcg YSBSQ1UgZ3JhY2UKPiA+IHBlcmlvZCB0byBwYXNzLgo+IAo+IFdlIGFyZSBub3QgYWxsb3dlZCB0 byBjYWxsIGZsdXNoX3dvcmsoKSBmcm9tIHRoZSBzaHJpbmtlciwgdGhlIHdvcmtxdWV1ZQo+IGRv ZXNuJ3QgaGF2ZSBhbmQgY2FuJ3QgaGF2ZSB0aGUgcmlnaHQgcmVjbGFpbSBmbGFncy4KCkkgZmln dXJlZCB0aGUgZmx1c2hfd29yayBoYWQgdG8gYmUgY29uZGl0aW9uYWwgdG8gInVubG9jayIgYmVp bmcgdHJ1ZQp0b28gaW4gdGhlIGk5MTUgc2hyaW5rZXIgKG5vdCBvbmx5IHN5bmNocm9uaXplX3Jj dV9leHBlZGl0ZWQoKSksIGFuZCBJCmFscmVhZHkgZml4ZWQgdGhhdCBiaXQsIGJ1dCBJIGRpZG4n dCB0aGluayBpdCB3b3VsZCBiZSBhIHByb2JsZW0gdG8Kd2FpdCBmb3IgdGhlIHdvcmtxdWV1ZSBh cyBsb25nIGFzIHJlY2xhaW0gZGlkbid0IHJlY3Vyc2Ugb24gdGhlCnN0cnVjdF9tdXRleCAoaXQg aXMgYSBwcm9ibGVtIGlmIHVubG9jayBpcyBmYWxzZSBvZiBjb3Vyc2UgYXMgd2Ugd291bGQKYmUg YmFjayB0byBzcXVhcmUgb25lKS4gSSBkaWRuJ3QgZ2V0IGZ1cnRoZXIgaGFuZ3MgYW5kIEkgYXNz dW1lIEkndmUKYmVlbiBydW5uaW5nIGEgY291cGxlIG9mIHN5bmNocm9uaXplX3JjdV9leHBlZGl0 ZWQoKSBhbmQgZmx1c2hfd29yayAoSQpzaG91bGQgYWRkIGR5bmFtaWMgdHJhY2luZyB0byBiZSBz dXJlKS4KCkFsc28gbm90ZSwgSSBkaWRuJ3QgZ2V0IGFueSBsb2NrZGVwIHdhcm5pbmcgd2hlbiBJ IHJlcHJvZHVjZWQgdGhlCndvcmtxdWV1ZSBoYW5nIGluIDQuMTEtcmM1IHNvIGF0IGxlYXN0IGFz IGZhciBhcyBsb2NrZGVwIGlzIGNvbmNlcm5lZAp0aGVyZSdzIG5vIHByb2JsZW0gdG8gY2FsbCBz eW5jaHJvbml6ZV9yY3VfZXhwZWRpdGVkIGFuZCBpdCBjb3VsZG4ndApub3RpY2Ugd2Ugd2VyZSBo b2xkaW5nIHRoZSBzdHJ1Y3RfbXV0ZXggd2hpbGUgd2FpdGluZyBmb3IgdGhlIG5ldwp3b3JrcXVl dWUgdG8gcnVuLgoKQWxzbyBub3RlIHJlY3Vyc2luZyBvbiB0aGUgbG9jayAodW5sb2NrIGZhbHNl IGNhc2UpIGlzIHNvbWV0aGluZwpub3RoaW5nIGVsc2UgZG9lcywgSSdtIG5vdCBzdXJlIGlmIGl0 J3Mgd29ydGggdGhlIHJpc2sgYW5kIGlmIHlvdQpzaG91bGRuJ3QganVzdCBjYWxsIG11dGV4X3Ry eWxvY2sgaW4gdGhlIHNocmlua2VyIGluc3RlYWQgb2YKbXV0ZXhfdHJ5bG9ja19yZWN1cnNpdmUu IE9uZSB0aGluZyB3YXMgdG8gcmVjdXJzZSBvbiB0aGUgbG9jawppbnRlcm5hbGx5IGluIHRoZSBz YW1lIGNvbnRleHQsIGJ1dCByZWN1cnNpbmcgdGhyb3VnaCB0aGUgd2hvbGUKcmVjbGFpbSBpcyBt b3JlIGR1YmlvdXMgYXMgc2FmZS4KCllvdSBjb3VsZCBzdGFydCBkcm9wcGluZyBvYmplY3RzIGFu ZCB3aXBpbmcgdm1hcyBhbmQgc3R1ZmYgaW4gdGhlCm1pZGRsZSBvZiBzb21lIGttYWxsb2MvYWxs b2NfcGFnZXMgdGhhdCBkb2Vzbid0IGV4cGVjdCBpdCBhbmQgdGhlbgpjcmFzaCBmb3Igb3RoZXIg cmVhc29ucy4gU28gdGhpcyByZWNsYWltIHJlY3Vyc2lvbiBtb2RlbCBvZiB0aGUKc2hpbmtlciBp cyBxdWl0ZSB1bmlxdWUgYW5kIHF1aXRlIGNoYWxsZW5naW5nIHRvIHByb29mIGFzIHNhZmUgaWYg eW91CmtlZXAgdXNpbmcgbXV0ZXhfdHJ5bG9ja19yZWN1cnNpdmUgaW4gaTkxNV9nZW1fc2hyaW5r ZXJfc2Nhbi4KCkxvY2sgcmVjdXJzaW9uIGluIGFsbCBvdGhlciBwbGFjZXMgY291bGQgYmUgZHJv cHBlZCB3aXRob3V0IHJ1bnRpbWUKZG93bnNpZGVzLCB0aGUgb25seSBwbGFjZSBtdXRleF90cnls b2NrX3JlY3Vyc2l2ZSBtYWtlcyBhIGRlc2lnbgpkaWZmZXJlbmNlIGFuZCBtYWtlcyBzZW5zZSB0 byBiZSB1c2VkIGlzIGluIGk5MTVfZ2VtX3Nocmlua2VyX3NjYW4sCnRoZSByZXN0IGFyZSBpbXBs ZW1lbnRhdGlvbiBpc3N1ZXMgbm90IGZ1bmRhbWVudGFsIHNocmlua2VyIGRlc2lnbiBhbmQKaXQn ZCBiZSBuaWNlIGlmIHRob3NlIG90aGVyIG11dGV4X3RyeWxvY2tfcmVjdXJzaXZlIHdvdWxkIGFs bCBiZQpyZW1vdmVkIGFuZCB0aGUgb25seSBvbmUgdGhhdCBpcyBsZWZ0IGlzIGluIGk5MTVfZ2Vt X3Nocmlua2VyX3NjYW4gYW5kCm5vd2hlcmUgZWxzZSAob3IgdG8gZHJvcCBpdCBhbHNvIGZyb20g aTkxNV9nZW1fc2hyaW5rZXJfc2NhbikuCgptdXRleF90cnlsb2NrX3JlY3Vyc2l2ZSgpIHNob3Vs ZCBhbHNvIGJlIHBhdGNoZWQgdG8gdXNlClJFQURfT05DRShfX211dGV4X293bmVyKGxvY2spKSBi ZWNhdXNlIGN1cnJlbnRseSBpdCBicmVha3MgQy4KCkluIHRoZSB3aG9sZSBrZXJuZWwgaTkxNSBh bmQgbXNtIGRybSBhcmUgdGhlIG9ubHkgdHdvIHVzZXJzIG9mIHN1Y2gKZnVuY3Rpb24gaW4gZmFj dC4KCkFub3RoZXIgdGhpbmcgaXMgd2hhdCB2YWx1ZSByZXR1cm4gZnJvbSBpOTE1X2dlbV9zaHJp bmtlcl9zY2FuIHdoZW4KdW5sb2NrIGlzIGZhbHNlLCBhbmQgd2UgY2FuJ3QgcG9zc2libHkgd2Fp dCBmb3IgdGhlIG1lbW9yeSB0byBiZSBmcmVlZApsZXQgYWxvbmUgZm9yIGEgcmN1IGdyYWNlIHBl cmlvZC4gRm9yIHZhcmlvdXMgcmVhc29ucyBJIHRoaW5rIGl0J3MKc2FmZXIgdG8gcmV0dXJuIHRo ZSBjdXJyZW50ICJmcmVlIiBldmVuIGlmIHdlIGNvdWxkIGFsc28gcmV0dXJuICIwIiBpbgpzdWNo IGNhc2UuIFRoZXJlIGFyZSBkaWZmZXJlbnQgdHJhZGVvZmZzLCByZXR1cm5pbmcgImZyZWUiIGlz IGxlc3MKbGlrZWx5IHRvIHRyaWdnZXIgYW4gZWFybHkgT09NIGFzIHRoZSBWTSB0aGlua3MgaXQn cyBzdGlsbCBtYWtpbmcKcHJvZ3Jlc3MgYW5kIGluIGZhY3QgaXQgd2lsbCBnZXQgbW9yZSBmcmVl IG1lbW9yeSBzaG9ydGx5LCB3aGlsZQpyZXR1cm5pbmcgU0hSSU5LX1NUT1Agd291bGQgYWxzbyBi ZSBhbiBvcHRpb24gYW5kIGl0IHdvdWxkIGluc2lzdCBtb3JlCm9uIHRoZSBvdGhlciBzbGFicyBz byBpdCB3b3VsZCBiZSBtb3JlIHJlbGlhYmxlIGF0IGZyZWVpbmcgbWVtb3J5CnRpbWVseSwgYnV0 IGl0IHdvdWxkIGJlIG1vcmUgYXQgcmlzayBvZiBlYXJseSBPT00uIEkgdGhpbmsgcmV0dXJuaW5n CiJmcmVlIiBpcyB0aGUgYmV0dGVyIHRyYWRlb2ZmIG9mIHRoZSB0d28sIGJ1dCBJIHN1Z2dlc3Qg dG8gYWRkIGEKY29tbWVudCBhcyBpdCdzIG5vdCBleGFjdGx5IG9idmlvdXMgd2hhdCBpcyBiZXR0 ZXIuCgpUaGFua3MsCkFuZHJlYQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fXwpJbnRlbC1nZnggbWFpbGluZyBsaXN0CkludGVsLWdmeEBsaXN0cy5mcmVlZGVz a3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9p bnRlbC1nZngK