From mboxrd@z Thu Jan 1 00:00:00 1970 From: Minchan Kim Subject: Re: [PATCH v7 00/12] Support non-lru page migration Date: Thu, 16 Jun 2016 13:47:10 +0900 Message-ID: <20160616044710.GP17127__42000.9812027731$1466052448$gmane$org@bbox> References: <1464736881-24886-1-git-send-email-minchan@kernel.org> <20160615075909.GA425@swordfish> <20160615231248.GI17127@bbox> <20160616024827.GA497@swordfish> <20160616025800.GO17127@bbox> <20160616042343.GA516@swordfish> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20160616042343.GA516@swordfish> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Sergey Senozhatsky Cc: Rik van Riel , Sergey Senozhatsky , Naoya Horiguchi , Jonathan Corbet , Chan Gyun Jeong , Rafael Aquini , Hugh Dickins , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, virtualization@lists.linux-foundation.org, John Einar Reitan , linux-mm@kvack.org, Chulmin Kim , Gioh Kim , Konstantin Khlebnikov , Sangseok Lee , Andrew Morton , Kyeongdon Kim , Joonsoo Kim , Vlastimil Babka , Mel Gorman List-Id: virtualization@lists.linuxfoundation.org On Thu, Jun 16, 2016 at 01:23:43PM +0900, Sergey Senozhatsky wrote: > On (06/16/16 11:58), Minchan Kim wrote: > [..] > > RAX: 2065676162726166 so rax is totally garbage, I think. > > It means obj_to_head returns garbage because get_first_obj_offset is > > utter crab because (page_idx / class->pages_per_zspage) was totally > > wrong. > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > 6408: f0 0f ba 28 00 lock btsl $0x0,(%rax) > > > > > > > > > > Could you test with [zsmalloc: keep first object offset in struct page] > > > > in mmotm? > > > > > > sure, I can. will it help, tho? we have a race condition here I think. > > > > I guess root cause is caused by get_first_obj_offset. > > sounds reasonable. > > > Please test with it. > > > this is what I'm getting with the [zsmalloc: keep first object offset in struct page] > applied: "count:0 mapcount:-127". which may be not related to zsmalloc at this point. > > kernel: BUG: Bad page state in process khugepaged pfn:101db8 > kernel: page:ffffea0004076e00 count:0 mapcount:-127 mapping: (null) index:0x1 Hm, it seems double free. It doen't happen if you disable zram? IOW, it seems to be related zsmalloc migration? How easy can you reprodcue it? Could you bisect it? > kernel: flags: 0x8000000000000000() > kernel: page dumped because: nonzero mapcount > kernel: Modules linked in: lzo zram zsmalloc mousedev coretemp hwmon crc32c_intel snd_hda_codec_realtek i2c_i801 snd_hda_codec_generic r8169 mii snd_hda_intel snd_hda_codec snd_hda_core acpi_cpufreq snd_pcm snd_timer snd soundcore lpc_ich processor mfd_core sch_fq_codel sd_mod hid_generic usb > kernel: CPU: 3 PID: 38 Comm: khugepaged Not tainted 4.7.0-rc3-next-20160615-dbg-00005-gfd11984-dirty #491 > kernel: 0000000000000000 ffff8801124c73f8 ffffffff814d69b0 ffffea0004076e00 > kernel: ffffffff81e658a0 ffff8801124c7420 ffffffff811e9b63 0000000000000000 > kernel: ffffea0004076e00 ffffffff81e658a0 ffff8801124c7440 ffffffff811e9ca9 > kernel: Call Trace: > kernel: [] dump_stack+0x68/0x92 > kernel: [] bad_page+0x158/0x1a2 > kernel: [] free_pages_check_bad+0xfc/0x101 > kernel: [] free_hot_cold_page+0x135/0x5de > kernel: [] __free_pages+0x67/0x72 > kernel: [] release_freepages+0x13a/0x191 > kernel: [] compact_zone+0x845/0x1155 > kernel: [] ? compaction_suitable+0x76/0x76 > kernel: [] compact_zone_order+0xe0/0x167 > kernel: [] ? compact_zone+0x1155/0x1155 > kernel: [] try_to_compact_pages+0x2f1/0x648 > kernel: [] ? try_to_compact_pages+0x2f1/0x648 > kernel: [] ? compaction_zonelist_suitable+0x3a6/0x3a6 > kernel: [] ? get_page_from_freelist+0x2c0/0x133c > kernel: [] __alloc_pages_direct_compact+0xea/0x30d > kernel: [] ? get_page_from_freelist+0x133c/0x133c > kernel: [] ? drain_all_pages+0x1d6/0x205 > kernel: [] __alloc_pages_nodemask+0x143d/0x16b6 > kernel: [] ? debug_show_all_locks+0x226/0x226 > kernel: [] ? warn_alloc_failed+0x24c/0x24c > kernel: [] ? finish_wait+0x1a4/0x1b0 > kernel: [] ? lock_acquire+0xec/0x147 > kernel: [] ? _raw_spin_unlock_irqrestore+0x3b/0x5c > kernel: [] ? _raw_spin_unlock_irqrestore+0x47/0x5c > kernel: [] ? finish_wait+0x1a4/0x1b0 > kernel: [] khugepaged+0x1d4/0x484f > kernel: [] ? hugepage_vma_revalidate+0xef/0xef > kernel: [] ? finish_task_switch+0x3de/0x484 > kernel: [] ? _raw_spin_unlock_irq+0x27/0x45 > kernel: [] ? trace_hardirqs_on_caller+0x3d2/0x492 > kernel: [] ? prepare_to_wait_event+0x3f7/0x3f7 > kernel: [] ? __schedule+0xa4d/0xd16 > kernel: [] kthread+0x252/0x261 > kernel: [] ? hugepage_vma_revalidate+0xef/0xef > kernel: [] ? kthread_create_on_node+0x377/0x377 > kernel: [] ret_from_fork+0x1f/0x40 > kernel: [] ? kthread_create_on_node+0x377/0x377 > -- Reboot -- > > -ss