From: Michal Hocko <mhocko@kernel.org> To: "Yan, Zheng" <zyan@redhat.com> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, jlayton@redhat.com, stable@vger.kernel.org Subject: Re: [PATCH] mm: save/restore current->journal_info in handle_mm_fault Date: Thu, 14 Dec 2017 14:30:41 +0100 [thread overview] Message-ID: <20171214133041.GO16951@dhcp22.suse.cz> (raw) In-Reply-To: <20171214105527.5885-1-zyan@redhat.com> On Thu 14-12-17 18:55:27, Yan, Zheng wrote: > We recently got an Oops report: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: jbd2__journal_start+0x38/0x1a2 > [...] > Call Trace: > ext4_page_mkwrite+0x307/0x52b > _ext4_get_block+0xd8/0xd8 > do_page_mkwrite+0x6e/0xd8 > handle_mm_fault+0x686/0xf9b > mntput_no_expire+0x1f/0x21e > __do_page_fault+0x21d/0x465 > dput+0x4a/0x2f7 > page_fault+0x22/0x30 > copy_user_generic_string+0x2c/0x40 > copy_page_to_iter+0x8c/0x2b8 > generic_file_read_iter+0x26e/0x845 > timerqueue_del+0x31/0x90 > ceph_read_iter+0x697/0xa33 [ceph] > hrtimer_cancel+0x23/0x41 > futex_wait+0x1c8/0x24d > get_futex_key+0x32c/0x39a > __vfs_read+0xe0/0x130 > vfs_read.part.1+0x6c/0x123 > handle_mm_fault+0x831/0xf9b > __fget+0x7e/0xbf > SyS_read+0x4d/0xb5 > > ceph_read_iter() uses current->journal_info to pass context info to > ceph_readpages(). Because ceph_readpages() needs to know if its caller > has already gotten capability of using page cache (distinguish read > from readahead/fadvise). ceph_read_iter() set current->journal_info, > then calls generic_file_read_iter(). > > In above Oops, page fault happened when copying data to userspace. > Page fault handler called ext4_page_mkwrite(). Ext4 code read > current->journal_info and assumed it is journal handle. > > I checked other filesystems, btrfs probably suffers similar problem > for its readpage. (page fault happens when write() copies data from > userspace memory and the memory is mapped to a file in btrfs. > verify_parent_transid() can be called during readpage) > > Cc: stable@vger.kernel.org > Signed-off-by: "Yan, Zheng" <zyan@redhat.com> I am not an FS expert so (ab)using journal_info for unrelated purposes might be acceptable in general but hooking into the generic PF path like this is just too ugly to live. Can this be limited to a FS code so that not everybody has to pay additional cycles? With a big fat warning that (ab)users might want to find a better way to comunicate their internal stuff. > --- > mm/memory.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index a728bed16c20..db2a50233c49 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4044,6 +4044,7 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > unsigned int flags) > { > int ret; > + void *old_journal_info; > > __set_current_state(TASK_RUNNING); > > @@ -4065,11 +4066,24 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > if (flags & FAULT_FLAG_USER) > mem_cgroup_oom_enable(); > > + /* > + * Fault can happen when filesystem A's read_iter()/write_iter() > + * copies data to/from userspace. Filesystem A may have set > + * current->journal_info. If the userspace memory is MAP_SHARED > + * mapped to a file in filesystem B, we later may call filesystem > + * B's vm operation. Filesystem B may also want to read/set > + * current->journal_info. > + */ > + old_journal_info = current->journal_info; > + current->journal_info = NULL; > + > if (unlikely(is_vm_hugetlb_page(vma))) > ret = hugetlb_fault(vma->vm_mm, vma, address, flags); > else > ret = __handle_mm_fault(vma, address, flags); > > + current->journal_info = old_journal_info; > + > if (flags & FAULT_FLAG_USER) { > mem_cgroup_oom_disable(); > /* > -- > 2.13.6 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: "Yan, Zheng" <zyan@redhat.com> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, viro@zeniv.linux.org.uk, jlayton@redhat.com, stable@vger.kernel.org Subject: Re: [PATCH] mm: save/restore current->journal_info in handle_mm_fault Date: Thu, 14 Dec 2017 14:30:41 +0100 [thread overview] Message-ID: <20171214133041.GO16951@dhcp22.suse.cz> (raw) In-Reply-To: <20171214105527.5885-1-zyan@redhat.com> On Thu 14-12-17 18:55:27, Yan, Zheng wrote: > We recently got an Oops report: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: jbd2__journal_start+0x38/0x1a2 > [...] > Call Trace: > ext4_page_mkwrite+0x307/0x52b > _ext4_get_block+0xd8/0xd8 > do_page_mkwrite+0x6e/0xd8 > handle_mm_fault+0x686/0xf9b > mntput_no_expire+0x1f/0x21e > __do_page_fault+0x21d/0x465 > dput+0x4a/0x2f7 > page_fault+0x22/0x30 > copy_user_generic_string+0x2c/0x40 > copy_page_to_iter+0x8c/0x2b8 > generic_file_read_iter+0x26e/0x845 > timerqueue_del+0x31/0x90 > ceph_read_iter+0x697/0xa33 [ceph] > hrtimer_cancel+0x23/0x41 > futex_wait+0x1c8/0x24d > get_futex_key+0x32c/0x39a > __vfs_read+0xe0/0x130 > vfs_read.part.1+0x6c/0x123 > handle_mm_fault+0x831/0xf9b > __fget+0x7e/0xbf > SyS_read+0x4d/0xb5 > > ceph_read_iter() uses current->journal_info to pass context info to > ceph_readpages(). Because ceph_readpages() needs to know if its caller > has already gotten capability of using page cache (distinguish read > from readahead/fadvise). ceph_read_iter() set current->journal_info, > then calls generic_file_read_iter(). > > In above Oops, page fault happened when copying data to userspace. > Page fault handler called ext4_page_mkwrite(). Ext4 code read > current->journal_info and assumed it is journal handle. > > I checked other filesystems, btrfs probably suffers similar problem > for its readpage. (page fault happens when write() copies data from > userspace memory and the memory is mapped to a file in btrfs. > verify_parent_transid() can be called during readpage) > > Cc: stable@vger.kernel.org > Signed-off-by: "Yan, Zheng" <zyan@redhat.com> I am not an FS expert so (ab)using journal_info for unrelated purposes might be acceptable in general but hooking into the generic PF path like this is just too ugly to live. Can this be limited to a FS code so that not everybody has to pay additional cycles? With a big fat warning that (ab)users might want to find a better way to comunicate their internal stuff. > --- > mm/memory.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/mm/memory.c b/mm/memory.c > index a728bed16c20..db2a50233c49 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4044,6 +4044,7 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > unsigned int flags) > { > int ret; > + void *old_journal_info; > > __set_current_state(TASK_RUNNING); > > @@ -4065,11 +4066,24 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address, > if (flags & FAULT_FLAG_USER) > mem_cgroup_oom_enable(); > > + /* > + * Fault can happen when filesystem A's read_iter()/write_iter() > + * copies data to/from userspace. Filesystem A may have set > + * current->journal_info. If the userspace memory is MAP_SHARED > + * mapped to a file in filesystem B, we later may call filesystem > + * B's vm operation. Filesystem B may also want to read/set > + * current->journal_info. > + */ > + old_journal_info = current->journal_info; > + current->journal_info = NULL; > + > if (unlikely(is_vm_hugetlb_page(vma))) > ret = hugetlb_fault(vma->vm_mm, vma, address, flags); > else > ret = __handle_mm_fault(vma, address, flags); > > + current->journal_info = old_journal_info; > + > if (flags & FAULT_FLAG_USER) { > mem_cgroup_oom_disable(); > /* > -- > 2.13.6 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-12-14 13:30 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-14 10:55 [PATCH] mm: save/restore current->journal_info in handle_mm_fault Yan, Zheng 2017-12-14 10:55 ` Yan, Zheng 2017-12-14 13:30 ` Michal Hocko [this message] 2017-12-14 13:30 ` Michal Hocko 2017-12-14 13:43 ` Jan Kara 2017-12-14 13:43 ` Jan Kara 2017-12-14 14:30 ` Yan, Zheng 2017-12-14 14:30 ` Yan, Zheng 2017-12-14 16:53 ` Jan Kara 2017-12-14 16:53 ` Jan Kara 2017-12-15 1:17 ` Yan, Zheng 2017-12-15 1:17 ` Yan, Zheng 2017-12-15 10:33 ` Jan Kara 2017-12-15 10:33 ` Jan Kara 2017-12-14 20:48 ` Andreas Dilger
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171214133041.GO16951@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=ceph-devel@vger.kernel.org \ --cc=jlayton@redhat.com \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=stable@vger.kernel.org \ --cc=viro@zeniv.linux.org.uk \ --cc=zyan@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.