From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755773AbcLSORD (ORCPT ); Mon, 19 Dec 2016 09:17:03 -0500 Received: from mx2.suse.de ([195.135.220.15]:38015 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755161AbcLSOQt (ORCPT ); Mon, 19 Dec 2016 09:16:49 -0500 Subject: Re: i915 regression in kernel 4.10 To: Chris Wilson , Linux Kernel Mailing List , dri-devel@lists.freedesktop.org, intel-gfx , airlied@linux.ie, jani.nikula@linux.intel.com, daniel.vetter@intel.com, Konrad Rzeszutek Wilk , Boris Ostrovsky References: <7abf8559-3aa7-af3a-8dc1-1dee42019fcd@suse.com> <20161219122934.GM29871@nuc-i3427.alporthouse.com> From: Juergen Gross Message-ID: <3de0be86-c0bc-6bfd-defa-745b589d7bd9@suse.com> Date: Mon, 19 Dec 2016 15:16:44 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161219122934.GM29871@nuc-i3427.alporthouse.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 19/12/16 13:29, Chris Wilson wrote: > On Mon, Dec 19, 2016 at 12:39:16PM +0100, Juergen Gross wrote: >> With recent 4.10 kernel the graphics isn't coming up under Xen. First >> failure message is: >> >> [ 46.656649] i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes) > > Do we get a silent failure? i915_gem_gtt_prepare_pages() is where we > call dma_map_sg() and pass the sg to swiotlb (in this case) for > remapping, and we do check for an error value of 0. After that error, > SWIOTLB_MAP_ERROR is propagated back and converted to 0 for > dma_map_sg(). That looks valid, and we should report ENOMEM back to the > caller. > >> Later I see splats like: >> >> [ 49.393583] general protection fault: 0000 [#1] SMP > > What was the faulting address? RAX is particularly non-pointer-like so I > wonder if we walked onto an uninitialised portion of the sgtable. We may > have tripped over a bug in our sg_page iterator. During the bisect process there have been either GP or NULL pointer dereferences or other page faults. Typical addresses where: xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000000000018 xen_swiotlb_unmap_sg_attrs+0x1f/0x50: access to 0000000003020118 > > The attached patch should prevent an early ENOMEM following the swiotlb > allocation failure. But I suspect that we will still be tripping up the > failure in the sg walker when binding to the GPU. > -Chris > The patch is working not too bad. :-) Still several "swiotlb buffer is full" messages (some with sz:, most without), but no faults any more (neither GP nor NULL pointer dereference). Graphical login is working now. What I do see, however, is (no idea whether this is related): [ 735.826492] INFO: task systemd-udevd:484 blocked for more than 120 seconds. [ 735.826497] Tainted: G W 4.9.0-pv+ #767 [ 735.826499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 735.826501] systemd-udevd D 0 484 443 0x00000000 [ 735.826507] Call Trace: [ 735.826522] ? __schedule+0x192/0x640 [ 735.826530] ? kmem_cache_free+0x45/0x150 [ 735.826535] ? schedule+0x2d/0x80 [ 735.826539] ? schedule_timeout+0x1f3/0x380 [ 735.826545] ? error_exit+0x9/0x20 [ 735.826555] ? sg_pool_index.part.0+0x2/0x2 [ 735.826561] ? wait_for_completion+0xa4/0x110 [ 735.826569] ? wake_up_q+0x70/0x70 [ 735.826577] ? cpufreq_boost_online+0x10/0x10 [acpi_cpufreq] [ 735.826585] ? cpuhp_issue_call+0x9c/0xe0 [ 735.826590] ? __cpuhp_setup_state+0xd5/0x1d0 [ 735.826599] ? acpi_cpufreq_init+0x1cd/0x1000 [acpi_cpufreq] [ 735.826601] ? 0xffffffffa00b1000 [ 735.826607] ? do_one_initcall+0x38/0x180 [ 735.826611] ? kmem_cache_alloc_trace+0x98/0x1e0 [ 735.826620] ? do_init_module+0x55/0x1e5 [ 735.826629] ? load_module+0x2088/0x26b0 [ 735.826633] ? __symbol_put+0x30/0x30 [ 735.826639] ? SYSC_finit_module+0x80/0xb0 [ 735.826644] ? entry_SYSCALL_64_fastpath+0x1e/0xad I guess it is _not_ related, OTOH there is sg_pool_index() involved... Juergen