From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5015BC433EF for ; Fri, 17 Sep 2021 11:59:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2FFF5610C8 for ; Fri, 17 Sep 2021 11:59:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242516AbhIQMAu (ORCPT ); Fri, 17 Sep 2021 08:00:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30275 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234705AbhIQMAr (ORCPT ); Fri, 17 Sep 2021 08:00:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631879965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uZP54lIQNa3SFVPU1L+sJMJyXHw+9iH2dBIPo6OwbrY=; b=XNLTJQMbIDXHhOfuRtOwjwTvyXDLjoohkiq8GJ8yS4KshZq9MI4gnCy0C2LuK6Ucl1stvN Da68oJehcUwgW0opZb+RXl7nSP8h4h5vdb29dzqJSSQMMQhrtctX/NmRa9U0KKjlMHPoIg VaUIgdkE1SXdoKi+5EcQSJNzrziymyo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-336-jftDKVi4NEuTuslHgpAfkg-1; Fri, 17 Sep 2021 07:59:24 -0400 X-MC-Unique: jftDKVi4NEuTuslHgpAfkg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3F02110168C4; Fri, 17 Sep 2021 11:59:21 +0000 (UTC) Received: from localhost (unknown [10.39.192.115]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0E73B5D6D5; Fri, 17 Sep 2021 11:59:18 +0000 (UTC) From: Cornelia Huck To: Jason Gunthorpe , Eric Farman Cc: David Airlie , Tony Krowiak , Alex Williamson , Christian Borntraeger , Daniel Vetter , dri-devel@lists.freedesktop.org, Harald Freudenberger , Vasily Gorbik , Heiko Carstens , intel-gfx@lists.freedesktop.org, intel-gvt-dev@lists.freedesktop.org, Jani Nikula , Jason Herne , Joonas Lahtinen , kvm@vger.kernel.org, Kirti Wankhede , linux-s390@vger.kernel.org, Matthew Rosato , Peter Oberparleiter , Halil Pasic , Rodrigo Vivi , Vineeth Vijayan , Zhenyu Wang , Zhi Wang , Christoph Hellwig Subject: Re: [PATCH v2 0/9] Move vfio_ccw to the new mdev API In-Reply-To: <20210914133618.GD4065468@nvidia.com> Organization: Red Hat GmbH References: <0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com> <1e431e58465b86430d02d429c86c427f7088bf1f.camel@linux.ibm.com> <20210913192407.GZ2505917@nvidia.com> <6f55044373dea4515b831957981bbf333e03de59.camel@linux.ibm.com> <20210914133618.GD4065468@nvidia.com> User-Agent: Notmuch/0.32.1 (https://notmuchmail.org) Date: Fri, 17 Sep 2021 13:59:16 +0200 Message-ID: <87h7ejh0q3.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-s390@vger.kernel.org On Tue, Sep 14 2021, Jason Gunthorpe wrote: > On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote: >> > I rebased it and fixed it up here: >> > >> > https://github.com/jgunthorpe/linux/tree/vfio_ccw >> > >> > Can you try again? >> >> That does address the crash, but then why is it processing a BROKEN >> event? Seems problematic. > > The stuff related to the NOT_OPER looked really wonky to me. I'm > guessing this is the issue - not sure about the pmcw.ena either.. [I have still not been able to digest the whole series, sorry.] > > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index 5ea392959c0711..0d4d4f425befac 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > } > > -static void fsm_close(struct vfio_ccw_private *private, > - enum vfio_ccw_event event) > +static int flush_sch(struct vfio_ccw_private *private) > { > struct subchannel *sch = private->sch; > DECLARE_COMPLETION_ONSTACK(completion); > int iretry, ret = 0; > > - spin_lock_irq(sch->lock); > - if (!sch->schib.pmcw.ena) > - goto err_unlock; > - ret = cio_disable_subchannel(sch); > - if (ret != -EBUSY) > - goto err_unlock; > - > iretry = 255; > do { > - > ret = cio_cancel_halt_clear(sch, &iretry); > - > if (ret == -EIO) { > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > sch->schid.ssid, sch->schid.sch_no); > - break; > + return ret; Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV we should be done as well, as then the device is dead and we do not need to disable it. > } > > /* > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > > if (ret == -EBUSY) > - wait_for_completion_timeout(&completion, 3*HZ); > + wait_for_completion_timeout(&completion, 3 * HZ); > > private->completion = NULL; > flush_workqueue(vfio_ccw_work_q); > spin_lock_irq(sch->lock); > ret = cio_disable_subchannel(sch); > } while (ret == -EBUSY); > + return ret; > +} > + > +static void fsm_close(struct vfio_ccw_private *private, > + enum vfio_ccw_event event) > +{ > + struct subchannel *sch = private->sch; > + int ret; > + > + spin_lock_irq(sch->lock); > + if (!sch->schib.pmcw.ena) > + goto err_unlock; > + ret = cio_disable_subchannel(sch); cio_disable_subchannel() should be happy to disable an already disabled subchannel, so I guess we can just walk through this and end up in CLOSED state... unless entering with !ena actually indicates that we messed up somewhere else in the state machine. I still need to find time to read the patches. > + if (ret == -EBUSY) > + ret = flush_sch(private); > if (ret) > goto err_unlock; > private->state = VFIO_CCW_STATE_CLOSED; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B687FC433FE for ; Fri, 17 Sep 2021 11:59:29 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 80558611C8 for ; Fri, 17 Sep 2021 11:59:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 80558611C8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 242576EC2A; Fri, 17 Sep 2021 11:59:28 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id F2AE16EC2A for ; Fri, 17 Sep 2021 11:59:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631879965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uZP54lIQNa3SFVPU1L+sJMJyXHw+9iH2dBIPo6OwbrY=; b=XNLTJQMbIDXHhOfuRtOwjwTvyXDLjoohkiq8GJ8yS4KshZq9MI4gnCy0C2LuK6Ucl1stvN Da68oJehcUwgW0opZb+RXl7nSP8h4h5vdb29dzqJSSQMMQhrtctX/NmRa9U0KKjlMHPoIg VaUIgdkE1SXdoKi+5EcQSJNzrziymyo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-336-jftDKVi4NEuTuslHgpAfkg-1; Fri, 17 Sep 2021 07:59:24 -0400 X-MC-Unique: jftDKVi4NEuTuslHgpAfkg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3F02110168C4; Fri, 17 Sep 2021 11:59:21 +0000 (UTC) Received: from localhost (unknown [10.39.192.115]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0E73B5D6D5; Fri, 17 Sep 2021 11:59:18 +0000 (UTC) From: Cornelia Huck To: Jason Gunthorpe , Eric Farman Cc: David Airlie , Tony Krowiak , Alex Williamson , Christian Borntraeger , Daniel Vetter , dri-devel@lists.freedesktop.org, Harald Freudenberger , Vasily Gorbik , Heiko Carstens , intel-gfx@lists.freedesktop.org, intel-gvt-dev@lists.freedesktop.org, Jani Nikula , Jason Herne , Joonas Lahtinen , kvm@vger.kernel.org, Kirti Wankhede , linux-s390@vger.kernel.org, Matthew Rosato , Peter Oberparleiter , Halil Pasic , Rodrigo Vivi , Vineeth Vijayan , Zhenyu Wang , Zhi Wang , Christoph Hellwig In-Reply-To: <20210914133618.GD4065468@nvidia.com> Organization: Red Hat GmbH References: <0-v2-7d3a384024cf+2060-ccw_mdev_jgg@nvidia.com> <1e431e58465b86430d02d429c86c427f7088bf1f.camel@linux.ibm.com> <20210913192407.GZ2505917@nvidia.com> <6f55044373dea4515b831957981bbf333e03de59.camel@linux.ibm.com> <20210914133618.GD4065468@nvidia.com> User-Agent: Notmuch/0.32.1 (https://notmuchmail.org) Date: Fri, 17 Sep 2021 13:59:16 +0200 Message-ID: <87h7ejh0q3.fsf@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Subject: Re: [Intel-gfx] [PATCH v2 0/9] Move vfio_ccw to the new mdev API X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, Sep 14 2021, Jason Gunthorpe wrote: > On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote: >> > I rebased it and fixed it up here: >> > >> > https://github.com/jgunthorpe/linux/tree/vfio_ccw >> > >> > Can you try again? >> >> That does address the crash, but then why is it processing a BROKEN >> event? Seems problematic. > > The stuff related to the NOT_OPER looked really wonky to me. I'm > guessing this is the issue - not sure about the pmcw.ena either.. [I have still not been able to digest the whole series, sorry.] > > diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c > index 5ea392959c0711..0d4d4f425befac 100644 > --- a/drivers/s390/cio/vfio_ccw_fsm.c > +++ b/drivers/s390/cio/vfio_ccw_fsm.c > @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > } > > -static void fsm_close(struct vfio_ccw_private *private, > - enum vfio_ccw_event event) > +static int flush_sch(struct vfio_ccw_private *private) > { > struct subchannel *sch = private->sch; > DECLARE_COMPLETION_ONSTACK(completion); > int iretry, ret = 0; > > - spin_lock_irq(sch->lock); > - if (!sch->schib.pmcw.ena) > - goto err_unlock; > - ret = cio_disable_subchannel(sch); > - if (ret != -EBUSY) > - goto err_unlock; > - > iretry = 255; > do { > - > ret = cio_cancel_halt_clear(sch, &iretry); > - > if (ret == -EIO) { > pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", > sch->schid.ssid, sch->schid.sch_no); > - break; > + return ret; Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV we should be done as well, as then the device is dead and we do not need to disable it. > } > > /* > @@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, > spin_unlock_irq(sch->lock); > > if (ret == -EBUSY) > - wait_for_completion_timeout(&completion, 3*HZ); > + wait_for_completion_timeout(&completion, 3 * HZ); > > private->completion = NULL; > flush_workqueue(vfio_ccw_work_q); > spin_lock_irq(sch->lock); > ret = cio_disable_subchannel(sch); > } while (ret == -EBUSY); > + return ret; > +} > + > +static void fsm_close(struct vfio_ccw_private *private, > + enum vfio_ccw_event event) > +{ > + struct subchannel *sch = private->sch; > + int ret; > + > + spin_lock_irq(sch->lock); > + if (!sch->schib.pmcw.ena) > + goto err_unlock; > + ret = cio_disable_subchannel(sch); cio_disable_subchannel() should be happy to disable an already disabled subchannel, so I guess we can just walk through this and end up in CLOSED state... unless entering with !ena actually indicates that we messed up somewhere else in the state machine. I still need to find time to read the patches. > + if (ret == -EBUSY) > + ret = flush_sch(private); > if (ret) > goto err_unlock; > private->state = VFIO_CCW_STATE_CLOSED;