From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752053AbeA2N5i (ORCPT <rfc822;w@1wt.eu>);
        Mon, 29 Jan 2018 08:57:38 -0500
Received: from mga14.intel.com ([192.55.52.115]:38337 "EHLO mga14.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751374AbeA2N5g (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 29 Jan 2018 08:57:36 -0500
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.46,431,1511856000";
   d="scan'208";a="13916223"
Subject: Re: [PATCH arm/aspeed/ast2500 v2] ipmi: add an Aspeed KCS IPMI BMC
 driver
To: minyard@acm.org, joel@jms.id.au, avifishman70@gmail.com,
        openbmc@lists.ozlabs.org, openipmi-developer@lists.sourceforge.net,
        linux-kernel@vger.kernel.org
Cc: andriy.shevchenko@intel.com
References: <1516810009-16353-1-git-send-email-haiyue.wang@linux.intel.com>
 <f20da565-6644-3ea4-823b-a808cf613958@gmail.com>
 <76454c8b-b8ab-06a5-297c-83bab32c86a7@linux.intel.com>
 <f6a94b94-da37-7dee-afab-85ec83afb92c@acm.org>
From: "Wang, Haiyue" <haiyue.wang@linux.intel.com>
Message-ID: <d12b53d2-9364-bc60-565f-a07a9dd138bc@linux.intel.com>
Date: Mon, 29 Jan 2018 21:57:33 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.5.2
MIME-Version: 1.0
In-Reply-To: <f6a94b94-da37-7dee-afab-85ec83afb92c@acm.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 2018-01-26 22:48, Corey Minyard wrote:
> On 01/26/2018 12:08 AM, Wang, Haiyue wrote:
>>
>>
>> On 2018-01-25 01:48, Corey Minyard wrote:
>>> On 01/24/2018 10:06 AM, Haiyue Wang wrote:
>>>> The KCS (Keyboard Controller Style) interface is used to perform 
>>>> in-band
>>>> IPMI communication between a server host and its BMC (BaseBoard 
>>>> Management
>>>> Controllers).
>>>>
>>>> This driver exposes the KCS interface on ASpeed SOCs (AST2400 and 
>>>> AST2500)
>>>> as a character device. Such SOCs are commonly used as BMCs and this 
>>>> driver
>>>> implements the BMC side of the KCS interface.
>>>>
>>>> Signed-off-by: Haiyue Wang <haiyue.wang@linux.intel.com>
>>>>
>>>> ---
>>>> v1->v2
>>>>
>>>> - Divide the driver into two parts, one handles the BMC KCS IPMI 
>>>> 2.0 state;
>>>>    the other handles the BMC KCS controller such as AST2500 IO 
>>>> accessing.
>>>> - Use the spin lock APIs to handle the device file operations and 
>>>> BMC chip
>>>>    IRQ inferface for accessing the same KCS BMC data structure.
>>>> - Enhanced the phases handling of the KCS BMC.
>>>> - Unified the IOCTL definition for IPMI BMC, it will be used by KCS 
>>>> and BT.
>>>>
>>>> ---
>>>>
>>>> +
>>>> +static void kcs_bmc_handle_data(struct kcs_bmc *kcs_bmc)
>>>> +{
>>>> +    u8 data;
>>>> +
>>>> +    switch (kcs_bmc->phase) {
>>>> +    case KCS_PHASE_WRITE:
>>>> +        set_state(kcs_bmc, WRITE_STATE);
>>>> +
>>>> +        /* set OBF before reading data */
>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +
>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>> +            kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>> +                        read_data(kcs_bmc);
>
> I missed this earlier, you need to issue a length error if the data is 
> too large.
>
>>>> +        break;
>>>> +
>>>> +    case KCS_PHASE_WRITE_END:
>>>> +        set_state(kcs_bmc, READ_STATE);
>>>> +
>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>> +            kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>> +                        read_data(kcs_bmc);
>>>> +
>>>> +        kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>> +        if (kcs_bmc->running) {
>>>
>>> Why do you only do this when running is set?  It won't hurt anything 
>>> if it's not
>>> set.  As it is, you have a race if something opens the device while 
>>> this code
>>> runs.
>>>
>>> Also, don't set the state to wait read until the "write" has 
>>> finished (userland has
>>> read the data out of the buffer.  More on that later.
>>>
>> Understood.
>>>> +            kcs_bmc->data_in_avail = true;
>>>> +            wake_up_interruptible(&kcs_bmc->queue);
>>>> +        }
>>>> +        break;
>>>> +
>>>> +    case KCS_PHASE_READ:
>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len)
>>>> +            set_state(kcs_bmc, IDLE_STATE);
>>>> +
>>>> +        data = read_data(kcs_bmc);
>>>> +        if (data != KCS_CMD_READ_BYTE) {
>>>> +            set_state(kcs_bmc, ERROR_STATE);
>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len) {
>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +            kcs_bmc->phase = KCS_PHASE_IDLE;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        write_data(kcs_bmc,
>>>> + kcs_bmc->data_out[kcs_bmc->data_out_idx++]);
>>>> +        break;
>>>> +
>>>> +    case KCS_PHASE_ABORT_ERROR1:
>>>> +        set_state(kcs_bmc, READ_STATE);
>>>> +
>>>> +        /* Read the Dummy byte */
>>>> +        read_data(kcs_bmc);
>>>> +
>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR2;
>>>> +        break;
>>>> +
>>>> +    case KCS_PHASE_ABORT_ERROR2:
>>>> +        set_state(kcs_bmc, IDLE_STATE);
>>>> +
>>>> +        /* Read the Dummy byte */
>>>> +        read_data(kcs_bmc);
>>>> +
>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +        kcs_bmc->phase = KCS_PHASE_IDLE;
>>>> +
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>> +
>>>> +        /* Read the Dummy byte */
>>>> +        read_data(kcs_bmc);
>>>> +
>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +        break;
>>>> +    }
>>>> +}
>>>> +
>>>> +static void kcs_bmc_handle_command(struct kcs_bmc *kcs_bmc)
>>>> +{
>>>> +    u8 cmd;
>>>> +
>>>> +    set_state(kcs_bmc, WRITE_STATE);
>>>> +
>>>> +    /* Dummy data to generate OBF */
>>>> +    write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +
>>>> +    cmd = read_data(kcs_bmc);
>>>
>>> Shouldn't you check the phase in all the cases below and do error
>>> handling if the phase isn't correct?
>>>
>>> Similar thing if the device here isn't open.  You need to handle
>>> that gracefully.
>>>
>>> Also, you should remove data_in_avail and data_in_idx setting from
>>> here, for reasons I will explain later.
>>>
>> If host software sends the data twice such as a retry before the 
>> BMC's IPMI service starts,
>> then the two IPMI requests will be merged into one, if not clear 
>> data_in_idx after receving
>> KCS_CMD_WRITE_START. Most of the states are driven by host software 
>> (SMS). :(
>
> True, but what if the host issues WRITE_START or a WRITE_END while 
> this driver is in read
> state?  The spec is unclear on this, but it really only makes sense 
> for the host to issue
> WRITE_START in idle stat and WRITE_END in write state.  IMHO it should 
> go to error
> state.  You might make the case that a WRITE_START anywhere restarts 
> the transaction,
> but the feel of the error state machine kind of goes against that. 
> WRITE_END is definitely
> wrong anywhere but write state.
>
> I just found the following in the spec (section 9.12):
>
>    Thus, since the interface will allow a command transfer to be
>    started or restarted
>    at any time when the input buffer is empty, software could elect to
>    simply retry
>    the command upon detecting an error condition, or issue a ‘known good’
>    command in order to clear ERROR_STATE
>
> So a WRITE_START anywhere is ok.  A WRITE_END in the wrong state 
> should probably
> still go to error state.  This means the user needs to be able to 
> handle a write error at
> any time.  It also means it's very important to make sure the user 
> does a read before
> doing a write.  If the host re-issues a WRITE_START and writes a new 
> command
> between the time the use reads the data and writes the response, the 
> response would
> be for the wrong command.
>
>>>> +    switch (cmd) {
>>>> +    case KCS_CMD_WRITE_START:
>>>> +        kcs_bmc->data_in_avail = false;
>>>> +        kcs_bmc->data_in_idx   = 0;
>>>> +        kcs_bmc->phase         = KCS_PHASE_WRITE;
>>>> +        kcs_bmc->error         = KCS_NO_ERROR;
>>>> +        break;
>>>> +
>>>> +    case KCS_CMD_WRITE_END:
>>>> +        kcs_bmc->phase = KCS_PHASE_WRITE_END;
>>>> +        break;
>>>> +
>>>> +    case KCS_CMD_ABORT:
>>>> +        if (kcs_bmc->error == KCS_NO_ERROR)
>>>> +            kcs_bmc->error = KCS_ABORTED_BY_COMMAND;
>>>> +
>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR1;
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        kcs_bmc->error = KCS_ILLEGAL_CONTROL_CODE;
>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>> +        break;
>>>> +    }
>>>> +}
>>>> +
>>>> +int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
>>>> +{
>>>> +    unsigned long flags;
>>>> +    int ret = 0;
>>>> +    u8 status;
>>>> +
>>>> +    spin_lock_irqsave(&kcs_bmc->lock, flags);
>>>> +
>>>> +    status = read_status(kcs_bmc) & (KCS_STATUS_IBF | 
>>>> KCS_STATUS_CMD_DAT);
>>>> +
>>>> +    switch (status) {
>>>> +    case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
>>>> +        kcs_bmc_handle_command(kcs_bmc);
>>>> +        break;
>>>> +
>>>> +    case KCS_STATUS_IBF:
>>>> +        kcs_bmc_handle_data(kcs_bmc);
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        ret = -1;
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +EXPORT_SYMBOL(kcs_bmc_handle_event);
>>>> +
>>>> +static inline struct kcs_bmc *file_kcs_bmc(struct file *filp)
>>>> +{
>>>> +    return container_of(filp->private_data, struct kcs_bmc, miscdev);
>>>> +}
>>>> +
>>>> +static int kcs_bmc_open(struct inode *inode, struct file *filp)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +    int ret = 0;
>>>> +
>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    if (!kcs_bmc->running) {
>>>> +        kcs_bmc->running       = 1;
>>>> +        kcs_bmc->phase         = KCS_PHASE_IDLE;
>>>> +        kcs_bmc->data_in_avail = false;
>>>
>>> If you do everything right, setting the phase and data_in_avail 
>>> should not
>>> be necessary here.
>>>
>>>> +    } else {
>>>> +        ret = -EBUSY;
>>>> +    }
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static unsigned int kcs_bmc_poll(struct file *filp, poll_table *wait)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +    unsigned int mask = 0;
>>>> +
>>>> +    poll_wait(filp, &kcs_bmc->queue, wait);
>>>> +
>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    if (kcs_bmc->data_in_avail)
>>>> +        mask |= POLLIN;
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return mask;
>>>> +}
>>>> +
>>>> +static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>> +                size_t count, loff_t *offset)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +    ssize_t ret = -EAGAIN;
>>>> +
>>>
>>> This function still has some issues.
>>>
>>> You can't call copy_to_user() with a spinlock held or interrupts 
>>> disabled.
>>> To handle readers, you probably need a separate mutex.
>>>
>>> Also, this function can return -EAGAIN even if O_NONBLOCK is not set if
>>> kcs_bmc->data_in_avail changes between when you wait on the event
>>> and when you check it under the lock.
>>>
>>> You also clear data_in_avail even if the copy_to_user() fails, which is
>>> wrong.
>>>
>>> I believe the best way to handle this would be to have the spinlock
>>> protect the inner workings of the state machine and a mutex handle
>>> copying data out, setting/clearing the running flag (thus a mutex
>>> instead of spinlock in open and release) and the ioctl settings (except
>>> for abort where you will need to grab the spinlock).
>>>
>>> After the wait event below, grab the mutex.  If data is not available
>>> and O_NONBLOCK is not set, drop the mutex and retry. Otherwise
>>> this is the only place (besides release) that sets data_in_avail to 
>>> false.
>>> Do the copy_to_user(), grab the spinlock, clear data_in_avail and
>>> data_in_idx, then release the lock and mutex.  If you are really
>>> adventurous you can do this without grabbing the lock using
>>> barriers, but it's probably not necessary here.
>>>
>
> With the state machine being able to be restarted at any time, you need
> something a little different here.  You still need the mutex to handle
> multiple readers and the copy.  I think the function should be something
> like:
>
Since KCS is not a multi-reader protocol from BMC's view, you makes 
things complex. :-)
>    static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>                     size_t count, loff_t *offset)
>    {
>         struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>         ssize_t ret;
>         bool avail;
>         size_t data_size;
>         u8 *data;
>
>         data = kmalloc(KCS_MSG_BUFSIZ, GFP_KERNEL);
>         if (!data)
>             return -ENOMEM;
>
>    retry:
>         ret = -EAGAIN;
>         if (!(filp->f_flags & O_NONBLOCK))
>             wait_event_interruptible(kcs_bmc->queue,
>                          kcs_bmc->data_in_avail);
>
>         mutex_lock(&kcs_bmc->read_mutex);
>
>         spin_lock_irq(&kcs_bmc->lock);
>         avail = kcs_bmc->data_in_avail;
>         if (avail) {
>             memcpy(data, kcs_bmc->data_in, kcs_bmc->data_in_idx);
>             data_size = kcs_bmc->data_in_idx;
>         }
>         spin_unlock_irq(&kcs_bmc->lock);
>
>         if (!avail) {
>             if (filp->f_flags & O_NONBLOCK)
>                 goto out_mutex_unlock;
>             mutex_unlock(&kcs_bmc->read_mutex);
>             goto retry;
>         }
>
>         if (count < data_size) {
>             ret = -EOVERFLOW;
>              ? I'm not sure about the error, but userspace needs to know.
>             goto out_mutex_unlock;
>         }
>
>         if (!copy_to_user(buf, data, data_size)) {
>             ret = -EFAULT;
>             goto out_mutex_unlock;
>         }
>
>         ret = data_size;
>
>         spin_lock_irq(&kcs_bmc->lock);
>
>         if (kcs_bmc->phase != KCS_PHASE_WRITE_END_DONE)
>             /* Something aborted or restarted the state machine. */
>             ? Maybe restart if O_NONBLOCK is not set and -EAGAIN if it 
> is?
>             ret = -EIO;
>         } else {
>             kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>             kcs_bmc->data_in_avail = false;
>             kcs_bmc->data_in_idx = 0;
>         }
>
>         spin_unlock_irq(&kcs_bmc->lock);
>
>    out_mutex_unlock:
>         mutex_unlock(&kcs_bmc->read_mutex);
>
>         kfree(data);
>
>         return ret;
>    }
> Note that I added a state, KCS_PHASE_WRITE_END_DONE, which would be
> set after the final byte from the host is received.  You want the read 
> here
> done before you can do the write below to avoid the race I talked about.
>
> There is a local copy made of the data.  What you *never* want to happen
> here is for the state machine to start processing a new write command
> while the data is being copied.  It could result in corrupt data being 
> read
> and some random operation being done by the BMC.
>
> If you want to avoid the local copy, it could be done, but it's more 
> complex.
>
>>>> +    if (!(filp->f_flags & O_NONBLOCK))
>>>> +        wait_event_interruptible(kcs_bmc->queue,
>>>> +                     kcs_bmc->data_in_avail);
>>>> +
>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    if (kcs_bmc->data_in_avail) {
>>>> +        kcs_bmc->data_in_avail = false;
>>>> +
>>>> +        if (count > kcs_bmc->data_in_idx)
>>>> +            count = kcs_bmc->data_in_idx;
>>>> +
>>>> +        if (!copy_to_user(buf, kcs_bmc->data_in, count))
>>>> +            ret = count;
>>>> +        else
>>>> +            ret = -EFAULT;
>>>> +    }
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static ssize_t kcs_bmc_write(struct file *filp, const char *buf,
>>>> +                 size_t count, loff_t *offset)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +    ssize_t ret = count;
>>>> +
>>>> +    if (count < 1 || count > KCS_MSG_BUFSIZ)
>>>> +        return -EINVAL;
>>>> +
>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
>>>> +        if (copy_from_user(kcs_bmc->data_out, buf, count)) {
>>>> +            spin_unlock_irq(&kcs_bmc->lock);
>>>> +            return -EFAULT;
>>>> +        }
>>>> +
>>>> +        kcs_bmc->phase = KCS_PHASE_READ;
>>>> +        kcs_bmc->data_out_idx = 1;
>>>> +        kcs_bmc->data_out_len = count;
>>>> +        write_data(kcs_bmc, kcs_bmc->data_out[0]);
>>>> +    } else if (kcs_bmc->phase == KCS_PHASE_READ) {
>>>> +        ret = -EBUSY;
>>>> +    } else {
>>>> +        ret = -EINVAL;
>>>
>>> Is there a reason you return -EINVAL here?  Why not just -EBUSY in all
>>> cases?  Is there something that userland will need to do differently?
>>>
>>>> +    }
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static long kcs_bmc_ioctl(struct file *filp, unsigned int cmd,
>>>> +              unsigned long arg)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +    long ret = 0;
>>>> +
>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    switch (cmd) {
>>>> +    case IPMI_BMC_IOCTL_SET_SMS_ATN:
>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>> +                        KCS_STATUS_SMS_ATN);
>>>> +        break;
>>>> +
>>>> +    case IPMI_BMC_IOCTL_CLEAR_SMS_ATN:
>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>> +                        0);
>>>> +        break;
>>>> +
>>>> +    case IPMI_BMC_IOCTL_FORCE_ABORT:
>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>> +        read_data(kcs_bmc);
>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>> +
>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>> +        kcs_bmc->data_in_avail = false;
>>>> +        break;
>>>> +
>>>> +    default:
>>>> +        ret = -EINVAL;
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static int kcs_bmc_release(struct inode *inode, struct file *filp)
>>>> +{
>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>> +
>>>
>>> What happens if the device gets closed in the middle of a 
>>> transaction?  That's
>>> an important case to handle.  If something is in process, you need 
>>> to abort it.
>>>
>> The device just provides the read & write data, the transaction is 
>> handled in the KCS
>> controller's IRQ handler.
>
> From the spec, section 9.14:
>
>    The BMC must change the status to ERROR_STATE on any condition 
> where it
>    aborts a command transfer in progress.
>
> So you need to do something here.
>
In practice, we do this as spec said in ipmid, NOT in driver, driver 
can't handle anything, let's
make it simple, thanks!

Whenever the BMC is reset (from power-on or a hard reset), the State 
Bits are initialized to “11 - Error State”. Doing so
allows SMS to detect that the BMC has been reset and that any message in 
process has been terminated by the BMC.
>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    kcs_bmc->running = 0;
>>>> +
>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>
>