Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpmsg_virtio: fix rpmsg_virtio_get_tx_payload_buffer() error #614

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wyr-7
Copy link
Contributor

@wyr-7 wyr-7 commented Sep 19, 2024

if rpmsg_virtio_notify_wait return RPMSG_SUCCESS, we don't call rpmsg_virtio_get_tx_buffer

@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch 2 times, most recently from 0436d97 to 61a3d85 Compare September 19, 2024 08:41
@arnopo

This comment was marked as outdated.

@arnopo arnopo requested review from edmooring, arnopo and tnmysh October 3, 2024 10:05
edmooring

This comment was marked as outdated.

@CV-Bowen

This comment was marked as outdated.

@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch from 61a3d85 to 27c1457 Compare October 8, 2024 07:13
@arnopo

This comment was marked as outdated.

@CV-Bowen
Copy link
Contributor

CV-Bowen commented Oct 8, 2024

@arnopo OK, the scenes you described do indeed require these features. @wyr-7 Let's only keep the commit 2.

@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch 2 times, most recently from 514afee to 7db6b5b Compare October 9, 2024 02:23
@wyr-7

This comment was marked as outdated.

arnopo

This comment was marked as outdated.

@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch from 7db6b5b to 8cbaffd Compare October 9, 2024 12:12
@wyr-7

This comment was marked as outdated.

@arnopo arnopo requested a review from edmooring October 9, 2024 12:47
@@ -390,11 +390,9 @@ static void *rpmsg_virtio_get_tx_payload_buffer(struct rpmsg_device *rdev,
* use metal_sleep_usec() method by default.
*/
status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
if (status == RPMSG_EOPNOTSUPP) {
if (status != RPMSG_SUCCESS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this before...

If rpmsg_virtio_notify_wait() returns an error that is not RPMSG_EOPNOTSUPP, we call metal_sleep_usec()
but here I suppose that it is not the expected behavior...Indeed rpmsg_virtio_notify_wait() has been implemented as an alternative of the the metal_sleep_usec() .

what about just removing

		} else if (status == RPMSG_SUCCESS) {
			break;

the loop would be:

while (1) {
	/* Lock the device to enable exclusive access to virtqueues */
	metal_mutex_acquire(&rdev->lock);
	rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx);
	metal_mutex_release(&rdev->lock);
	if (rp_hdr || !tick_count)
		break;
	/*
	 * Try to use wait loop implemented in the virtio dispatcher and
	 * use metal_sleep_usec() method by default.
	 */	status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
	if (status == RPMSG_EOPNOTSUPP) {
		metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL);
		tick_count--;
	}
}	

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo But when rpmsg_virtio_notify_wait return rvdev->notify_wait_cb, it does not limit the return value only to RPMSG_EOPNOTSUPP, this may have other error conditions, should we continue trying to get the buffer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good question.
In the current version of the patch we continue trying to get buffer but after calling metal_sleep_usec().
this do not seem to work as expected. rpmsg_virtio_notify_wait() is used as an alternative of the basic metal_sleep_usec(). metal_sleep_usec() should be called only if rpmsg_virtio_notify_wait ops is not supported.

I would be in favor of returning NULL in case of error (and I suspect that it was the initial expectation)

while (1) {
	/* Lock the device to enable exclusive access to virtqueues */
	metal_mutex_acquire(&rdev->lock);
	rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx);
	metal_mutex_release(&rdev->lock);
	if (rp_hdr || !tick_count)
		break;
	/*
	 * Try to use wait loop implemented in the virtio dispatcher and
	 * use metal_sleep_usec() method by default.
	 */	status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
	if (status == RPMSG_EOPNOTSUPP) {
		metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL);
		tick_count--;
		} else if (status != RPMSG_SUCCESS) {
			break;
		}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo Yes, but I think we should return back to the metal_sleep_usec() when rpmsg_virtio_notify_wait() return other error number (not RPMSG_EOPNOTSUPP) to make the rpmsg user can get the tx buffer as mush as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not understand why we should call metal_sleep_usec(). rpmsg_virtio_notify_wait is an alternative of metal_sleep_usec. either user customizes the wait or they use legacy one based on metal_sleep_usec . I can not see a reason to use one and then the other. as this will just introduce extra delay on rpmsg_virtio_notify_wait failure.

for me rpmsg_virtio_notify_wait should return either a success on notification received or an error on timeout.

  • in case of success, loop to get a buffer
  • in case of failure we leave the function returning NULL value,

Do you see a reason to loop again on failure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo OK, in our case, we want to continue to process the remain rx buffers in rx virtqueue when can not get the tx buffer in the rpmsg_virtio_rx_callback() thread. This is the example code:

static int rptun_notify_wait(FAR struct rpmsg_device *rdev, uint32_t id)
{
  FAR struct rptun_priv_s *priv = (FAR struct rptun_priv_s *)
    metal_container_of(rdev, struct rpmsg_s, rdev);

  if (current thread PID != rpsmg virtio rx callback thread PID)
    {
      /* Do not allow to continue to process the rx buffers recursively if
       * get tx payload buffer thread is not the rpmsg virtio rx callback thread.
      */

      return -EAGAIN;
    }

  /* Wait new tx buffer */

  nxsem_tickwait(&priv->semtx, MSEC2TICK(RPTUN_TIMEOUT_MS));

  /* Process the remain rx buffers in rx virtquues recursively  */

  remoteproc_get_notification(xxx);

  return 0;
}

So we may return -EAGAIN, but maybe we can change this value to RPMSG_EOPNOTSUPP to work around this issue. But I don't think it's the best way.

Thanks for the details.

If I well understood your issue is that you can have reentrance as you would like to treat new RX message while you are waiting for a TX buffer for a previous one.

Something that I still not understand is why do you need to enter in the loop that call metal_sleep_usec() for that?
for me this look like a workaround to fix something else.

If your thread need to sleep t, what about calling nxsig_usleep and returning 0 instead of returning -EAGAIN ?

Copy link
Contributor

@CV-Bowen CV-Bowen Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo If the thread called rpmsg_get_tx_payload_buffer(wait = true) is not the rx thread (that call rpmsg_virtio_rx_callback()), and there is no tx buffer.
rptun_notify_wait() will return -EAGAIN and rpmsg_get_tx_payload_buffer(wait = true) will return NULL immediately with the old code, but we actually want to wait until the tx buffer returned by remote or timeout (15s).

If your thread need to sleep t, what about calling nxsig_usleep and returning 0 instead of returning -EAGAIN ?

Yes, we can return RPMSG_EOPNOTSUPP instead -EAGAIN to work around this issue, but I think RPMSG_EOPNOTSUPP means the notify_wait() callback is not implemented but we actually have implemented it. This is why I think return RPMSG_EOPNOTSUPP in rptun_notify_wait() is not the best way to solve this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we have something not well designed here. I prefer that we take the time to develop a proper fix rather than merge something that addresses your use case but is not clean.

From my perspective we have to fix the rpmsg_virtio_notify_wait Apis.

we should pass a timeout value as parameter

	while (1) {
		/* Lock the device to enable exclusive access to virtqueues */
		metal_mutex_acquire(&rdev->lock);
		rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx);
		metal_mutex_release(&rdev->lock);
		if (rp_hdr || !tick_count)
			break;

		/*
		 * Try to use wait loop implemented in the virtio dispatcher and
		 * use metal_sleep_usec() method by default.
		 */
-		status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
+		status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq, **RPMSG_TICKS_PER_INTERVAL**);
		if (status == RPMSG_EOPNOTSUPP) {
			metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL);
			tick_count--;
		} else if (status != RPMSG_SUCCESS) {
			break;
		}
	}

I think it is too late to fix the APIs for this release. If you are okay with returning -RPMSG_EOPNOTSUPP instead of -EAGAIN as a workaround for this release, I propose we go with this workaround and integrate the following fix for this release:

	while (1) {
		/* Lock the device to enable exclusive access to virtqueues */
		metal_mutex_acquire(&rdev->lock);
		rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx);
		metal_mutex_release(&rdev->lock);
		if (rp_hdr || !tick_count)
			break;

		/*
		 * Try to use wait loop implemented in the virtio dispatcher and
		 * use metal_sleep_usec() method by default.
		 */
		status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
		if (status == RPMSG_EOPNOTSUPP) {
			metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL);
			tick_count--;
-		} else if (status == RPMSG_SUCCESS) {
+		} else if (status != RPMSG_SUCCESS) {
			break;
		}
	}

@edmooring , @tnmysh any opinion on that?

Copy link
Contributor

@CV-Bowen CV-Bowen Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo OK, it's good to me, but could you help to update this PR because it's already very late in China and @wyr-7 has already rested. Or I can create a new PR to do this thing?
What do you think?

Copy link
Contributor

@CV-Bowen CV-Bowen Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arnopo I have created a new PR #624 and it's up to you to decide whether to use the new PR or update the current PR, Thanks.

@arnopo
Copy link
Collaborator

arnopo commented Oct 17, 2024

I would like to merge this one in the release as it fixes an issue

@wyr-7 could you address my comment before tomorrow code freeze?
@edmooring @tnmysh, could you review it?

Thanks in Advance

@arnopo arnopo added this to the Release V2024.10 milestone Oct 17, 2024
@tnmysh
Copy link
Collaborator

tnmysh commented Oct 17, 2024

LGTM.

Copy link
Contributor

@edmooring edmooring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go.

@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch from 8cbaffd to 938bc55 Compare October 21, 2024 06:31
@wyr-7 wyr-7 changed the title fix rpmsg_virtio_get_tx_payload_buffer possible errors not considered rpmsg_virtio: fix rpmsg_virtio_get_tx_payload_buffer() error Oct 21, 2024
@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch from 938bc55 to 3187e0b Compare October 21, 2024 06:33
If rpmsg_virtio_notify_wait returns RPMSG_SUCCESS, we don't call
rpmsg_virtio_get_tx_buffer.

Signed-off-by: Yongrong Wang <[email protected]>
Signed-off-by: Bowen Wang <[email protected]>
@wyr-7 wyr-7 force-pushed the remoteproc_virtio branch from 3187e0b to e4cb837 Compare October 21, 2024 06:37
@wyr-7
Copy link
Contributor Author

wyr-7 commented Oct 21, 2024

@arnopo @CV-Bowen Thanks, I have updated the latest discussion results to this PR.

@wyr-7

This comment was marked as resolved.

@arnopo

This comment was marked as resolved.

@arnopo arnopo removed this from the Release V2024.10 milestone Oct 21, 2024
@wyr-7
Copy link
Contributor Author

wyr-7 commented Oct 21, 2024

Yes something goes wrong with this test, just ignore it for the moment as it is relying to the Zephyr main branch that can not be stage.

Ok, thanks.

@arnopo
Copy link
Collaborator

arnopo commented Oct 21, 2024

I mergerd PR #624.

@CV-Bowen @wyr-7 you can update this one to update rpmsg_virtio_notify_wait and other API to better meet your needs

Copy link
Collaborator

@arnopo arnopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be udated according to #614 (comment)

Copy link

This pull request has been marked as a stale pull request because it has been open (more than) 45 days with no activity.

@github-actions github-actions bot added the Stale label Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants