xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jürgen Groß" <jgross@suse.com>
To: SeongJae Park <sjpark@amazon.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	konrad.wilk@oracle.com, pdurrant@amazon.com,
	linux-kernel@vger.kernel.org, SeongJae Park <sj38.park@gmail.com>,
	xen-devel@lists.xenproject.org, roger.pau@citrix.com
Subject: Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
Date: Tue, 17 Dec 2019 09:16:47 +0100	[thread overview]
Message-ID: <f9f686ce-aeca-0947-5b2b-91e1d0c183dd@suse.com> (raw)
In-Reply-To: <20191217075932.4516-1-sjpark@amazon.com>

On 17.12.19 08:59, SeongJae Park wrote:
> On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" <jgross@suse.com> wrote:
> 
>> On 16.12.19 20:48, SeongJae Park wrote:
>>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
>>>
>>>> On 16.12.19 17:15, SeongJae Park wrote:
>>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@amazon.com> wrote:
>>>>>
>>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@amazon.com> wrote:
>>>>>>
>>>>>>> From: SeongJae Park <sjpark@amazon.de>
>>>>>>>
>>>>> [...]
>>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
>>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
>>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
>>>>>>>     }
>>>>>>>     
>>>>>>>     
>>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a while. */
>>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
>>>>>>> +module_param_named(buffer_squeeze_duration_ms,
>>>>>>> +		buffer_squeeze_duration_ms, int, 0644);
>>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
>>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is detected");
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Callback received when the memory pressure is detected.
>>>>>>> + */
>>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
>>>>>>> +{
>>>>>>> +	struct backend_info *be = dev_get_drvdata(&dev->dev);
>>>>>>> +
>>>>>>> +	be->blkif->buffer_squeeze_end = jiffies +
>>>>>>> +		msecs_to_jiffies(buffer_squeeze_duration_ms);
>>>>>>
>>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could result in
>>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links
>>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
>>>>>>
>>>>>> I will do more test and share results.  Meanwhile, if you have any opinion,
>>>>>> please let me know.
>>>
>>> I reduced system memory and attached bunch of devices in short time so that
>>> memory pressure occurs while device attachments are ongoing.  Under this
>>> circumstance, I was able to see the race.
>>>
>>>>>
>>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
>>>>> concurrency issues could be in other drivers in their way, I suggest to change
>>>>> the reclaim callback ('->reclaim_memory') to be called for each driver instead
>>>>> of each device.  Then, each driver could be able to deal with its concurrency
>>>>> issues by itself.
>>>>
>>>> Hmm, I don't like that. This would need to be changed back in case we
>>>> add per-guest quota.
>>>
>>> Extending this callback in that way would be still not too hard.  We could use
>>> the argument to the callback.  I would keep the argument of the callback to
>>> 'struct device *' as is, and will add a comment saying 'NULL' value of the
>>> argument means every devices.  As an example, xenbus would pass NULL-ending
>>> array of the device pointers that need to free its resources.
>>>
>>> After seeing this race, I am now also thinking it could be better to delegate
>>> detailed control of each device to its driver, as some drivers have some
>>> complicated and unique relation with its devices.
>>>
>>>>
>>>> Wouldn't a get_device() before calling the callback and a put_device()
>>>> afterwards avoid that problem?
>>>
>>> I didn't used the reference count manipulation operations because other similar
>>> parts also didn't.  But, if there is no implicit reference count guarantee, it
>>> seems those operations are indeed necessary.
>>>
>>> That said, as get/put operations only adjust the reference count, those will
>>> not make the callback to wait until the linking of the 'backend' and 'blkif' to
>>> the device (xen_blkbk_probe()) is finished.  Thus, the race could still happen.
>>> Or, am I missing something?
>>
>> No, I think we need a xenbus lock per device which will need to be
>> taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the
>> callback.
> 
> I also agree that locking should be used at last.  But, as each driver manages
> its devices and resources in their way, it could have its unique race
> conditions.  And, each unique race condition might have its unique efficient
> way to synchronize it.  Therefore, I think the synchronization should be done
> by each driver, not by xenbus and thus we should make the callback to be called
> per-driver.

xenbus controls creation and removing of devices, so applying locking
at xenbus level is the right thing to do in order to avoid races with
device removal.

In case a backend has further synchronization requirements those have to
be handled at backend level, of course.

In the end you'll need the xenbus level locking anyway in order to avoid
a race when the last backend specific device is just being removed when
the callback is about to be called for that device. Or you'd need to
call try_get_module() before calling into each backend...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-12-17  8:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-16 12:45 [Xen-devel] [PATCH v10 0/4] xenbus/backend: Add a memory pressure handler callback SeongJae Park
2019-12-16 12:45 ` [Xen-devel] [PATCH v10 1/4] xenbus/backend: Add " SeongJae Park
2019-12-16 12:45 ` [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected SeongJae Park
2019-12-16 14:37   ` SeongJae Park
2019-12-16 16:15     ` SeongJae Park
2019-12-16 16:23       ` Jürgen Groß
2019-12-16 19:48         ` SeongJae Park
2019-12-17  6:23           ` Jürgen Groß
2019-12-17  7:59             ` SeongJae Park
2019-12-17  8:16               ` Jürgen Groß [this message]
2019-12-17  8:30                 ` SeongJae Park
2019-12-17 16:17                   ` SeongJae Park
2019-12-17 11:39           ` Roger Pau Monné
2019-12-17 13:15             ` SeongJae Park
2019-12-17 13:51               ` Jürgen Groß
2019-12-16 12:45 ` [Xen-devel] [PATCH v10 3/4] xen/blkback: Remove unnecessary static variable name prefixes SeongJae Park
2019-12-16 12:45 ` [Xen-devel] [PATCH v10 4/4] xen/blkback: Consistently insert one empty line between functions SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9f686ce-aeca-0947-5b2b-91e1d0c183dd@suse.com \
    --to=jgross@suse.com \
    --cc=axboe@kernel.dk \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pdurrant@amazon.com \
    --cc=roger.pau@citrix.com \
    --cc=sj38.park@gmail.com \
    --cc=sjpark@amazon.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).