From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933359AbcFPC5z (ORCPT ); Wed, 15 Jun 2016 22:57:55 -0400 Received: from LGEAMRELO13.lge.com ([156.147.23.53]:53636 "EHLO lgeamrelo13.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932570AbcFPC5y (ORCPT ); Wed, 15 Jun 2016 22:57:54 -0400 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 165.244.98.76 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.223.161 X-Original-MAILFROM: minchan@kernel.org Date: Thu, 16 Jun 2016 11:58:00 +0900 From: Minchan Kim To: Sergey Senozhatsky CC: Andrew Morton , , , Vlastimil Babka , , Hugh Dickins , John Einar Reitan , Jonathan Corbet , Joonsoo Kim , Konstantin Khlebnikov , Mel Gorman , Naoya Horiguchi , Rafael Aquini , Rik van Riel , Sergey Senozhatsky , , Gioh Kim , Chan Gyun Jeong , Sangseok Lee , Kyeongdon Kim , Chulmin Kim Subject: Re: [PATCH v7 00/12] Support non-lru page migration Message-ID: <20160616025800.GO17127@bbox> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> MIME-Version: 1.0 In-Reply-To: <20160616024827.GA497@swordfish> User-Agent: Mutt/1.5.21 (2010-09-15) X-MIMETrack: Itemize by SMTP Server on LGEKRMHUB03/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/06/16 11:57:50, Serialize by Router on LGEKRMHUB03/LGE/LG Group(Release 8.5.3FP6|November 21, 2013) at 2016/06/16 11:57:50, Serialize complete at 2016/06/16 11:57:50 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 16, 2016 at 11:48:27AM +0900, Sergey Senozhatsky wrote: > Hi, > > On (06/16/16 08:12), Minchan Kim wrote: > > > [ 315.146533] kasan: CONFIG_KASAN_INLINE enabled > > > [ 315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory access > > > [ 315.146546] general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > [ 315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common > > > [ 315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160614-dbg-00004-ga1c2cbc-dirty #488 > > > [ 315.146841] task: ffff8800bfaf2900 ti: ffff880112468000 task.ti: ffff880112468000 > > > [ 315.146859] RIP: 0010:[] [] zs_page_migrate+0x355/0xaa0 [zsmalloc] > > > > Thanks for the report! > > > > zs_page_migrate+0x355? Could you tell me what line is it? > > > > It seems to be related to obj_to_head. > > reproduced. a bit different call stack this time. but the problem is > still the same. > > zs_compact() > ... > 6371: e8 00 00 00 00 callq 6376 > 6376: 0f 0b ud2 > 6378: 48 8b 95 a8 fe ff ff mov -0x158(%rbp),%rdx > 637f: 4d 8d 74 24 78 lea 0x78(%r12),%r14 > 6384: 4c 89 ee mov %r13,%rsi > 6387: 4c 89 e7 mov %r12,%rdi > 638a: e8 86 c7 ff ff callq 2b15 > 638f: 41 89 c5 mov %eax,%r13d > 6392: 4c 89 f0 mov %r14,%rax > 6395: 48 c1 e8 03 shr $0x3,%rax > 6399: 8a 04 18 mov (%rax,%rbx,1),%al > 639c: 84 c0 test %al,%al > 639e: 0f 85 f2 02 00 00 jne 6696 > 63a4: 41 8b 44 24 78 mov 0x78(%r12),%eax > 63a9: 41 0f af c7 imul %r15d,%eax > 63ad: 41 01 c5 add %eax,%r13d > 63b0: 4c 89 f0 mov %r14,%rax > 63b3: 48 c1 e8 03 shr $0x3,%rax > 63b7: 48 01 d8 add %rbx,%rax > 63ba: 48 89 85 88 fe ff ff mov %rax,-0x178(%rbp) > 63c1: 41 81 fd ff 0f 00 00 cmp $0xfff,%r13d > 63c8: 0f 87 1a 03 00 00 ja 66e8 > 63ce: 49 63 f5 movslq %r13d,%rsi > 63d1: 48 03 b5 98 fe ff ff add -0x168(%rbp),%rsi > 63d8: 48 8b bd a8 fe ff ff mov -0x158(%rbp),%rdi > 63df: e8 67 d9 ff ff callq 3d4b > 63e4: a8 01 test $0x1,%al > 63e6: 0f 84 d9 02 00 00 je 66c5 > 63ec: 48 83 e0 fe and $0xfffffffffffffffe,%rax > 63f0: bf 01 00 00 00 mov $0x1,%edi > 63f5: 48 89 85 b0 fe ff ff mov %rax,-0x150(%rbp) > 63fc: e8 00 00 00 00 callq 6401 > 6401: 48 8b 85 b0 fe ff ff mov -0x150(%rbp),%rax RAX: 2065676162726166 so rax is totally garbage, I think. It means obj_to_head returns garbage because get_first_obj_offset is utter crab because (page_idx / class->pages_per_zspage) was totally wrong. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > Could you test with [zsmalloc: keep first object offset in struct page] > > in mmotm? > > sure, I can. will it help, tho? we have a race condition here I think. I guess root cause is caused by get_first_obj_offset. Please test with it. Thanks! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id C09006B0005 for ; Wed, 15 Jun 2016 22:57:53 -0400 (EDT) Received: by mail-it0-f70.google.com with SMTP id d71so65449492ith.1 for ; Wed, 15 Jun 2016 19:57:53 -0700 (PDT) Received: from lgeamrelo13.lge.com (LGEAMRELO13.lge.com. [156.147.23.53]) by mx.google.com with ESMTP id z64si301633itd.81.2016.06.15.19.57.52 for ; Wed, 15 Jun 2016 19:57:53 -0700 (PDT) Date: Thu, 16 Jun 2016 11:58:00 +0900 From: Minchan Kim Subject: Re: [PATCH v7 00/12] Support non-lru page migration Message-ID: <20160616025800.GO17127@bbox> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> MIME-Version: 1.0 In-Reply-To: <20160616024827.GA497@swordfish> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Sergey Senozhatsky Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , dri-devel@lists.freedesktop.org, Hugh Dickins , John Einar Reitan , Jonathan Corbet , Joonsoo Kim , Konstantin Khlebnikov , Mel Gorman , Naoya Horiguchi , Rafael Aquini , Rik van Riel , Sergey Senozhatsky , virtualization@lists.linux-foundation.org, Gioh Kim , Chan Gyun Jeong , Sangseok Lee , Kyeongdon Kim , Chulmin Kim On Thu, Jun 16, 2016 at 11:48:27AM +0900, Sergey Senozhatsky wrote: > Hi, > > On (06/16/16 08:12), Minchan Kim wrote: > > > [ 315.146533] kasan: CONFIG_KASAN_INLINE enabled > > > [ 315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory access > > > [ 315.146546] general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > [ 315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common > > > [ 315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160614-dbg-00004-ga1c2cbc-dirty #488 > > > [ 315.146841] task: ffff8800bfaf2900 ti: ffff880112468000 task.ti: ffff880112468000 > > > [ 315.146859] RIP: 0010:[] [] zs_page_migrate+0x355/0xaa0 [zsmalloc] > > > > Thanks for the report! > > > > zs_page_migrate+0x355? Could you tell me what line is it? > > > > It seems to be related to obj_to_head. > > reproduced. a bit different call stack this time. but the problem is > still the same. > > zs_compact() > ... > 6371: e8 00 00 00 00 callq 6376 > 6376: 0f 0b ud2 > 6378: 48 8b 95 a8 fe ff ff mov -0x158(%rbp),%rdx > 637f: 4d 8d 74 24 78 lea 0x78(%r12),%r14 > 6384: 4c 89 ee mov %r13,%rsi > 6387: 4c 89 e7 mov %r12,%rdi > 638a: e8 86 c7 ff ff callq 2b15 > 638f: 41 89 c5 mov %eax,%r13d > 6392: 4c 89 f0 mov %r14,%rax > 6395: 48 c1 e8 03 shr $0x3,%rax > 6399: 8a 04 18 mov (%rax,%rbx,1),%al > 639c: 84 c0 test %al,%al > 639e: 0f 85 f2 02 00 00 jne 6696 > 63a4: 41 8b 44 24 78 mov 0x78(%r12),%eax > 63a9: 41 0f af c7 imul %r15d,%eax > 63ad: 41 01 c5 add %eax,%r13d > 63b0: 4c 89 f0 mov %r14,%rax > 63b3: 48 c1 e8 03 shr $0x3,%rax > 63b7: 48 01 d8 add %rbx,%rax > 63ba: 48 89 85 88 fe ff ff mov %rax,-0x178(%rbp) > 63c1: 41 81 fd ff 0f 00 00 cmp $0xfff,%r13d > 63c8: 0f 87 1a 03 00 00 ja 66e8 > 63ce: 49 63 f5 movslq %r13d,%rsi > 63d1: 48 03 b5 98 fe ff ff add -0x168(%rbp),%rsi > 63d8: 48 8b bd a8 fe ff ff mov -0x158(%rbp),%rdi > 63df: e8 67 d9 ff ff callq 3d4b > 63e4: a8 01 test $0x1,%al > 63e6: 0f 84 d9 02 00 00 je 66c5 > 63ec: 48 83 e0 fe and $0xfffffffffffffffe,%rax > 63f0: bf 01 00 00 00 mov $0x1,%edi > 63f5: 48 89 85 b0 fe ff ff mov %rax,-0x150(%rbp) > 63fc: e8 00 00 00 00 callq 6401 > 6401: 48 8b 85 b0 fe ff ff mov -0x150(%rbp),%rax RAX: 2065676162726166 so rax is totally garbage, I think. It means obj_to_head returns garbage because get_first_obj_offset is utter crab because (page_idx / class->pages_per_zspage) was totally wrong. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > Could you test with [zsmalloc: keep first object offset in struct page] > > in mmotm? > > sure, I can. will it help, tho? we have a race condition here I think. I guess root cause is caused by get_first_obj_offset. Please test with it. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Minchan Kim Subject: Re: [PATCH v7 00/12] Support non-lru page migration Date: Thu, 16 Jun 2016 11:58:00 +0900 Message-ID: <20160616025800.GO17127@bbox> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160616024827.GA497@swordfish> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Sergey Senozhatsky Cc: Rik van Riel , Sergey Senozhatsky , Naoya Horiguchi , Jonathan Corbet , Chan Gyun Jeong , Rafael Aquini , Hugh Dickins , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, virtualization@lists.linux-foundation.org, John Einar Reitan , linux-mm@kvack.org, Chulmin Kim , Gioh Kim , Konstantin Khlebnikov , Sangseok Lee , Andrew Morton , Kyeongdon Kim , Joonsoo Kim , Vlastimil Babka , Mel Gorman List-Id: virtualization@lists.linuxfoundation.org On Thu, Jun 16, 2016 at 11:48:27AM +0900, Sergey Senozhatsky wrote: > Hi, > > On (06/16/16 08:12), Minchan Kim wrote: > > > [ 315.146533] kasan: CONFIG_KASAN_INLINE enabled > > > [ 315.146538] kasan: GPF could be caused by NULL-ptr deref or user memory access > > > [ 315.146546] general protection fault: 0000 [#1] PREEMPT SMP KASAN > > > [ 315.146576] Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel r8169 i2c_i801 mii snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich mfd_core processor sch_fq_codel sd_mod hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common > > > [ 315.146785] CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160614-dbg-00004-ga1c2cbc-dirty #488 > > > [ 315.146841] task: ffff8800bfaf2900 ti: ffff880112468000 task.ti: ffff880112468000 > > > [ 315.146859] RIP: 0010:[] [] zs_page_migrate+0x355/0xaa0 [zsmalloc] > > > > Thanks for the report! > > > > zs_page_migrate+0x355? Could you tell me what line is it? > > > > It seems to be related to obj_to_head. > > reproduced. a bit different call stack this time. but the problem is > still the same. > > zs_compact() > ... > 6371: e8 00 00 00 00 callq 6376 > 6376: 0f 0b ud2 > 6378: 48 8b 95 a8 fe ff ff mov -0x158(%rbp),%rdx > 637f: 4d 8d 74 24 78 lea 0x78(%r12),%r14 > 6384: 4c 89 ee mov %r13,%rsi > 6387: 4c 89 e7 mov %r12,%rdi > 638a: e8 86 c7 ff ff callq 2b15 > 638f: 41 89 c5 mov %eax,%r13d > 6392: 4c 89 f0 mov %r14,%rax > 6395: 48 c1 e8 03 shr $0x3,%rax > 6399: 8a 04 18 mov (%rax,%rbx,1),%al > 639c: 84 c0 test %al,%al > 639e: 0f 85 f2 02 00 00 jne 6696 > 63a4: 41 8b 44 24 78 mov 0x78(%r12),%eax > 63a9: 41 0f af c7 imul %r15d,%eax > 63ad: 41 01 c5 add %eax,%r13d > 63b0: 4c 89 f0 mov %r14,%rax > 63b3: 48 c1 e8 03 shr $0x3,%rax > 63b7: 48 01 d8 add %rbx,%rax > 63ba: 48 89 85 88 fe ff ff mov %rax,-0x178(%rbp) > 63c1: 41 81 fd ff 0f 00 00 cmp $0xfff,%r13d > 63c8: 0f 87 1a 03 00 00 ja 66e8 > 63ce: 49 63 f5 movslq %r13d,%rsi > 63d1: 48 03 b5 98 fe ff ff add -0x168(%rbp),%rsi > 63d8: 48 8b bd a8 fe ff ff mov -0x158(%rbp),%rdi > 63df: e8 67 d9 ff ff callq 3d4b > 63e4: a8 01 test $0x1,%al > 63e6: 0f 84 d9 02 00 00 je 66c5 > 63ec: 48 83 e0 fe and $0xfffffffffffffffe,%rax > 63f0: bf 01 00 00 00 mov $0x1,%edi > 63f5: 48 89 85 b0 fe ff ff mov %rax,-0x150(%rbp) > 63fc: e8 00 00 00 00 callq 6401 > 6401: 48 8b 85 b0 fe ff ff mov -0x150(%rbp),%rax RAX: 2065676162726166 so rax is totally garbage, I think. It means obj_to_head returns garbage because get_first_obj_offset is utter crab because (page_idx / class->pages_per_zspage) was totally wrong. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > Could you test with [zsmalloc: keep first object offset in struct page] > > in mmotm? > > sure, I can. will it help, tho? we have a race condition here I think. I guess root cause is caused by get_first_obj_offset. Please test with it. Thanks!