From: Mike Christie <mchristi@redhat.com>
To: Xiubo Li <lixiubo@cmss.chinamobile.com>, nab@linux-iscsi.org
Cc: agrover@redhat.com, iliastsi@arrikto.com, namei.unix@gmail.com,
sheng@yasker.org, linux-scsi@vger.kernel.org,
target-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
Jianfei Hu <hujianfei@cmss.chinamobile.com>
Subject: Re: [PATCH v6 2/2] tcmu: Add global data block pool support
Date: Mon, 1 May 2017 13:40:32 -0500 [thread overview]
Message-ID: <59078120.3000200@redhat.com> (raw)
In-Reply-To: <25e7247f-116f-aaa5-e895-b503c0a893f9@cmss.chinamobile.com>
On 04/30/2017 06:29 AM, Xiubo Li wrote:
> [...]
>>> +static struct page *tcmu_try_get_block_page(struct tcmu_dev *udev,
>>> uint32_t dbi)
>>> +{
>>> + struct page *page;
>>> + int ret;
>>> +
>>> + mutex_lock(&udev->cmdr_lock);
>>> + page = tcmu_get_block_page(udev, dbi);
>>> + if (likely(page)) {
>>> + mutex_unlock(&udev->cmdr_lock);
>>> + return page;
>>> + }
>>> +
>>> + /*
>>> + * Normally it shouldn't be here:
>>> + * Only when the userspace has touched the blocks which
>>> + * are out of the tcmu_cmd's data iov[], and will return
>>> + * one zeroed page.
>>
>> Is it a userspace bug when this happens? Do you know when it is
>> occcuring?
> Since the UIO will map the whole ring buffer to the user space at the
> beginning, and the userspace is allowed and legal to access any block
> within the limits of the mapped ring area.
>
> But actually when this happens, it normally will be one bug of the
> userspace. Without this checking the kernel will output many page fault
> dump traces.
>
> Maybe here outputing some warning message is a good idea, and will be
> easy to debug for userspace.
Yeah.
>
>
> [...]
>>> @@ -1388,6 +1509,81 @@ static ssize_t tcmu_cmd_time_out_store(struct
>>> config_item *item, const char *pag
>>> .tb_dev_attrib_attrs = NULL,
>>> };
>>> +static int unmap_thread_fn(void *data)
>>> +{
>>> + struct tcmu_dev *udev;
>>> + loff_t off;
>>> + uint32_t start, end, block;
>>> + struct page *page;
>>> + int i;
>>> +
>>> + while (1) {
>>> + DEFINE_WAIT(__wait);
>>> +
>>> + prepare_to_wait(&unmap_wait, &__wait, TASK_INTERRUPTIBLE);
>>> + schedule();
>>> + finish_wait(&unmap_wait, &__wait);
>>> +
>>> + mutex_lock(&root_udev_mutex);
>>> + list_for_each_entry(udev, &root_udev, node) {
>>> + mutex_lock(&udev->cmdr_lock);
>>> +
>>> + /* Try to complete the finished commands first */
>>> + tcmu_handle_completions(udev);
>>> +
>>> + /* Skip the udevs waiting the global pool or in idle */
>>> + if (udev->waiting_global || !udev->dbi_thresh) {
>>> + mutex_unlock(&udev->cmdr_lock);
>>> + continue;
>>> + }
>>> +
>>> + end = udev->dbi_max + 1;
>>> + block = find_last_bit(udev->data_bitmap, end);
>>> + if (block == udev->dbi_max) {
>>> + /*
>>> + * The last bit is dbi_max, so there is
>>> + * no need to shrink any blocks.
>>> + */
>>> + mutex_unlock(&udev->cmdr_lock);
>>> + continue;
>>> + } else if (block == end) {
>>> + /* The current udev will goto idle state */
>>> + udev->dbi_thresh = start = 0;
>>> + udev->dbi_max = 0;
>>> + } else {
>>> + udev->dbi_thresh = start = block + 1;
>>> + udev->dbi_max = block;
>>> + }
>>> +
>>> + /* Here will truncate the data area from off */
>>> + off = udev->data_off + start * DATA_BLOCK_SIZE;
>>> + unmap_mapping_range(udev->inode->i_mapping, off, 0, 1);
>>> +
>>> + /* Release the block pages */
>>> + for (i = start; i < end; i++) {
>>> + page = radix_tree_delete(&udev->data_blocks, i);
>>> + if (page) {
>>> + __free_page(page);
>>> + atomic_dec(&global_db_count);
>>> + }
>>> + }
>>> + mutex_unlock(&udev->cmdr_lock);
>>> + }
>>> +
>>> + /*
>>> + * Try to wake up the udevs who are waiting
>>> + * for the global data pool.
>>> + */
>>> + list_for_each_entry(udev, &root_udev, node) {
>>> + if (udev->waiting_global)
>>> + wake_up(&udev->wait_cmdr);
>>> + }
>>
>> To avoid starvation, I think you want a second list/fifo that holds the
>> watiers. In tcmu_get_empty_block if the list is not empty, record how
>> many pages we needed and then add the device to the list and wait in
>> tcmu_queue_cmd_ring.
>>
>> Above if we freed enough pages for the device at head then wake up the
>> device.
>>
>> I think you also need a wake_up call in the completion path in case the
>> initial call could not free enough pages. It could probably check if the
>> completion was going to free enough pages for a waiter and then call
>> wake.
>>
> Yes, I meant to introduce this later after this series to not let the
> patches too
> complex to review.
>
> If you agree I will do this later, or in V7 series ?
Yeah, I am ok with adding it after the initial patches go in.
next prev parent reply other threads:[~2017-05-01 18:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-26 6:25 [PATCH v6 0/2] tcmu: Dynamic growing data area support lixiubo
2017-04-26 6:25 ` [PATCH v6 1/2] tcmu: Add dynamic growing data area feature support lixiubo
2017-04-30 5:48 ` Mike Christie
2017-04-30 10:22 ` [PATCH v6 1/2] tcmu: Add dynamic growing data area featuresupport Xiubo Li
2017-05-01 18:37 ` Mike Christie
2017-04-26 6:25 ` [PATCH v6 2/2] tcmu: Add global data block pool support lixiubo
2017-04-26 10:07 ` kbuild test robot
2017-04-30 7:31 ` Mike Christie
2017-04-30 11:29 ` Xiubo Li
2017-05-01 18:40 ` Mike Christie [this message]
2017-05-02 5:26 ` Nicholas A. Bellinger
2017-05-02 5:25 ` [PATCH v6 0/2] tcmu: Dynamic growing data area support Nicholas A. Bellinger
2017-05-02 9:25 ` 答复: " lixiubo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59078120.3000200@redhat.com \
--to=mchristi@redhat.com \
--cc=agrover@redhat.com \
--cc=hujianfei@cmss.chinamobile.com \
--cc=iliastsi@arrikto.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lixiubo@cmss.chinamobile.com \
--cc=nab@linux-iscsi.org \
--cc=namei.unix@gmail.com \
--cc=sheng@yasker.org \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).