All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Steven Haigh <netwiz@crc.id.au>, glenn@rimuhosting.com
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com>,
	xen-devel@lists.xen.org
Subject: Re: null domains after xl destroy
Date: Wed, 3 May 2017 15:38:32 +0200	[thread overview]
Message-ID: <398304f9-205c-7451-e95b-c4a17e7a8a0c@suse.com> (raw)
In-Reply-To: <b7b4977a-a7ff-f86b-a3ec-f65389eb5c63@crc.id.au>

On 03/05/17 12:45, Steven Haigh wrote:
> Just wanted to give this a little nudge now people seem to be back on
> deck...

Things seem to be more complicated than I thought.

There are clearly paths leading to use-after-free scenarios, e.g. the
one of the backtrace below:

xen_blkbk_remove() will free be regardless of the return value of
xen_blkif_disconnect(). If -EBUSY is returned xen_blkif_disconnect()
will be called again at the end of an I/O still in progress, leading
to a call of xenbus_unmap_ring_vfree(blkif->be->dev, ...) with be
already freed...

As Roger already said: this is a complete mess.

Working on a patch now...

BTW: Glenn, the debug patch isn't important any longer. It was just
meant to locate the problem which is now known.


Juergen

> 
> On 01/05/17 10:55, Glenn Enright wrote:
>> On 19/04/17 22:09, Juergen Gross wrote:
>>> On 19/04/17 09:16, Roger Pau Monné wrote:
>>>> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote:
>>>>> On 19/04/17 03:02, Glenn Enright wrote:
>>>>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still
>>>>>> shows the issue. When replicating the leak I now see this trace (via
>>>>>> dmesg). Hopefully that is useful.
>>>>>>
>>>>>> Please note, I'm going to be offline next week, but am keen to keep on
>>>>>> with this, it may just be a while before I followup is all.
>>>>>>
>>>>>> Regards, Glenn
>>>>>> http://rimuhosting.com
>>>>>>
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508
>>>>>> xen_blkbk_remove+0x138/0x140
>>>>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev
>>>>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4
>>>>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state
>>>>>> iptable_security
>>>>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>>>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp
>>>>>> llc
>>>>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801
>>>>>> i2c_smbus
>>>>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci
>>>>>> floppy
>>>>>> dm_mirror dm_region_hash dm_log dm_mod
>>>>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1
>>>>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
>>>>>>  ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000
>>>>>>  0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d
>>>>>>  ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0
>>>>>> Call Trace:
>>>>>>  [<ffffffff8136b61f>] dump_stack+0x67/0x98
>>>>>>  [<ffffffff8108007d>] __warn+0xfd/0x120
>>>>>>  [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
>>>>>>  [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140
>>>>>>  [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0
>>>>>>  [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160
>>>>>>  [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40
>>>>>>  [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190
>>>>>>  [<ffffffff814b93a2>] device_del+0x112/0x210
>>>>>>  [<ffffffff81448113>] ? xenbus_read+0x53/0x70
>>>>>>  [<ffffffff814b94c2>] device_unregister+0x22/0x60
>>>>>>  [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0
>>>>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>>>>  [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140
>>>>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>>>>  [<ffffffff81449fe0>] frontend_changed+0x10/0x20
>>>>>>  [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140
>>>>>>  [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20
>>>>>>  [<ffffffff816ed93a>] ? schedule+0x3a/0xa0
>>>>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>>>>  [<ffffffff810c0c5d>] ? complete+0x4d/0x60
>>>>>>  [<ffffffff81447760>] ? split+0xf0/0xf0
>>>>>>  [<ffffffff810a051d>] kthread+0xcd/0xf0
>>>>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>>>>  [<ffffffff816f1b45>] ret_from_fork+0x25/0x30
>>>>>> ---[ end trace ee097287c9865a62 ]---
>>>>>
>>>>> Konrad, Roger,
>>>>>
>>>>> this was triggered by a debug patch in xen_blkbk_remove():
>>>>>
>>>>>     if (be->blkif)
>>>>> -        xen_blkif_disconnect(be->blkif);
>>>>> +        WARN_ON(xen_blkif_disconnect(be->blkif));
>>>>>
>>>>> So I guess we need something like xen_blk_drain_io() in case of
>>>>> calls to
>>>>> xen_blkif_disconnect() which are not allowed to fail (either at the
>>>>> call
>>>>> sites of xen_blkif_disconnect() or in this function depending on a new
>>>>> boolean parameter indicating it should wait for outstanding I/Os).
>>>>>
>>>>> I can try a patch, but I'd appreciate if you could confirm this
>>>>> wouldn't
>>>>> add further problems...
>>>>
>>>> Hello,
>>>>
>>>> Thanks for debugging this, the easiest solution seems to be to
>>>> replace the
>>>> ring->inflight atomic_read check in xen_blkif_disconnect with a call to
>>>> xen_blk_drain_io instead, and making xen_blkif_disconnect return void
>>>> (to
>>>> prevent further issues like this one).
>>>
>>> Glenn,
>>>
>>> can you please try the attached patch (in dom0)?
>>>
>>>
>>> Juergen
>>>
>>
>> (resending with full CC list)
>>
>> I'm back. After testing unfortunately I'm still seeing the leak. The
>> below trace is with the debug patch applied as well under 4.9.25. It
>> looks very similar to me. I am still able to replicate this reliably.
>>
>> Regards, Glenn
>> http://rimuhosting.com
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:511
>> xen_blkbk_remove+0x138/0x140
>> Modules linked in: ebt_ip xen_pciback xen_netback xen_gntalloc
>> xen_gntdev xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4
>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security
>> iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc
>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw i2c_i801 i2c_smbus
>> i2c_core sg e1000e ptp pps_core i3000_edac edac_core raid1 sd_mod ahci
>> libahci floppy dm_mirror dm_region_hash dm_log dm_mod
>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.25-1.el6xen.x86_64 #1
>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
>>  ffffc90040cfbb98 ffffffff8136b76f 0000000000000013 0000000000000000
>>  0000000000000000 0000000000000000 ffffc90040cfbbe8 ffffffff8108007d
>>  ffffea0000141720 000001ff41334434 ffff880000000001 ffff88004d3aedc0
>> Call Trace:
>>  [<ffffffff8136b76f>] dump_stack+0x67/0x98
>>  [<ffffffff8108007d>] __warn+0xfd/0x120
>>  [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
>>  [<ffffffff814ec0a8>] xen_blkbk_remove+0x138/0x140
>>  [<ffffffff81449b07>] xenbus_dev_remove+0x47/0xa0
>>  [<ffffffff814bd2b4>] __device_release_driver+0xb4/0x160
>>  [<ffffffff814bd38d>] device_release_driver+0x2d/0x40
>>  [<ffffffff814bc2b4>] bus_remove_device+0x124/0x190
>>  [<ffffffff814b9682>] device_del+0x112/0x210
>>  [<ffffffff81448423>] ? xenbus_read+0x53/0x70
>>  [<ffffffff814b97a2>] device_unregister+0x22/0x60
>>  [<ffffffff814eda9d>] frontend_changed+0xad/0x4c0
>>  [<ffffffff81449e67>] xenbus_otherend_changed+0xc7/0x140
>>  [<ffffffff816f1486>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>  [<ffffffff8144a2f0>] frontend_changed+0x10/0x20
>>  [<ffffffff81447b0c>] xenwatch_thread+0x9c/0x140
>>  [<ffffffff810bffb0>] ? woken_wake_function+0x20/0x20
>>  [<ffffffff816ed98a>] ? schedule+0x3a/0xa0
>>  [<ffffffff816f1486>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>  [<ffffffff810c0c6d>] ? complete+0x4d/0x60
>>  [<ffffffff81447a70>] ? split+0xf0/0xf0
>>  [<ffffffff810a0535>] kthread+0xe5/0x100
>>  [<ffffffff810a051d>] ? kthread+0xcd/0x100
>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>  [<ffffffff816f1bc5>] ret_from_fork+0x25/0x30
>> ---[ end trace ea3a48c80e4ad79d ]---
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-05-03 13:38 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-11  5:25 null domains after xl destroy Glenn Enright
2017-04-11  5:59 ` Juergen Gross
2017-04-11  8:03   ` Glenn Enright
2017-04-11  9:49     ` Dietmar Hahn
2017-04-11 22:13       ` Glenn Enright
2017-04-11 22:23         ` Andrew Cooper
2017-04-11 22:45           ` Glenn Enright
2017-04-18  8:36             ` Juergen Gross
2017-04-19  1:02               ` Glenn Enright
2017-04-19  4:39                 ` Juergen Gross
2017-04-19  7:16                   ` Roger Pau Monné
2017-04-19  7:35                     ` Juergen Gross
2017-04-19 10:09                     ` Juergen Gross
2017-04-19 16:22                       ` Steven Haigh
2017-04-21  8:42                         ` Steven Haigh
2017-04-21  8:44                           ` Juergen Gross
2017-05-01  0:55                       ` Glenn Enright
2017-05-03 10:45                         ` Steven Haigh
2017-05-03 13:38                           ` Juergen Gross [this message]
2017-05-03 15:53                           ` Juergen Gross
2017-05-03 16:58                             ` Steven Haigh
2017-05-03 22:17                               ` Glenn Enright
2017-05-08  9:10                                 ` Juergen Gross
2017-05-09  9:24                                   ` Roger Pau Monné
2017-05-13  4:02                                     ` Glenn Enright
2017-05-15  9:57                                       ` Juergen Gross
2017-05-16  0:49                                         ` Glenn Enright
2017-05-16  1:18                                           ` Steven Haigh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=398304f9-205c-7451-e95b-c4a17e7a8a0c@suse.com \
    --to=jgross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=glenn@rimuhosting.com \
    --cc=netwiz@crc.id.au \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.