rpmsg_virtio: fix rpmsg_virtio_get_tx_payload_buffer() error #614

wyr-7 · 2024-09-19T07:52:39Z

if rpmsg_virtio_notify_wait return RPMSG_SUCCESS, we don't call rpmsg_virtio_get_tx_buffer

CV-Bowen · 2024-10-08T14:10:47Z

@arnopo OK, the scenes you described do indeed require these features. @wyr-7 Let's only keep the commit 2.

arnopo · 2024-10-16T07:56:13Z

lib/rpmsg/rpmsg_virtio.c

@@ -390,11 +390,9 @@ static void *rpmsg_virtio_get_tx_payload_buffer(struct rpmsg_device *rdev,
 		 * use metal_sleep_usec() method by default.
 		 */
 		status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq);
-		if (status == RPMSG_EOPNOTSUPP) {
+		if (status != RPMSG_SUCCESS) {


Sorry, I missed this before...

If rpmsg_virtio_notify_wait() returns an error that is not RPMSG_EOPNOTSUPP, we call metal_sleep_usec()
but here I suppose that it is not the expected behavior...Indeed rpmsg_virtio_notify_wait() has been implemented as an alternative of the the metal_sleep_usec() .

what about just removing

} else if (status == RPMSG_SUCCESS) { break;

the loop would be:

while (1) { /* Lock the device to enable exclusive access to virtqueues */ metal_mutex_acquire(&rdev->lock); rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx); metal_mutex_release(&rdev->lock); if (rp_hdr || !tick_count) break; /* * Try to use wait loop implemented in the virtio dispatcher and * use metal_sleep_usec() method by default. */ status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq); if (status == RPMSG_EOPNOTSUPP) { metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL); tick_count--; } }

@arnopo But when rpmsg_virtio_notify_wait return rvdev->notify_wait_cb, it does not limit the return value only to RPMSG_EOPNOTSUPP, this may have other error conditions, should we continue trying to get the buffer?

It is a good question.
In the current version of the patch we continue trying to get buffer but after calling metal_sleep_usec().
this do not seem to work as expected. rpmsg_virtio_notify_wait() is used as an alternative of the basic metal_sleep_usec(). metal_sleep_usec() should be called only if rpmsg_virtio_notify_wait ops is not supported.

I would be in favor of returning NULL in case of error (and I suspect that it was the initial expectation)

while (1) { /* Lock the device to enable exclusive access to virtqueues */ metal_mutex_acquire(&rdev->lock); rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx); metal_mutex_release(&rdev->lock); if (rp_hdr || !tick_count) break; /* * Try to use wait loop implemented in the virtio dispatcher and * use metal_sleep_usec() method by default. */ status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq); if (status == RPMSG_EOPNOTSUPP) { metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL); tick_count--; } else if (status != RPMSG_SUCCESS) { break; } }

@arnopo Yes, but I think we should return back to the metal_sleep_usec() when rpmsg_virtio_notify_wait() return other error number (not RPMSG_EOPNOTSUPP) to make the rpmsg user can get the tx buffer as mush as possible.

I can not understand why we should call metal_sleep_usec(). rpmsg_virtio_notify_wait is an alternative of metal_sleep_usec. either user customizes the wait or they use legacy one based on metal_sleep_usec . I can not see a reason to use one and then the other. as this will just introduce extra delay on rpmsg_virtio_notify_wait failure.

for me rpmsg_virtio_notify_wait should return either a success on notification received or an error on timeout.

in case of success, loop to get a buffer

in case of failure we leave the function returning NULL value,

Do you see a reason to loop again on failure?

@arnopo OK, in our case, we want to continue to process the remain rx buffers in rx virtqueue when can not get the tx buffer in the rpmsg_virtio_rx_callback() thread. This is the example code:

static int rptun_notify_wait(FAR struct rpmsg_device *rdev, uint32_t id) { FAR struct rptun_priv_s *priv = (FAR struct rptun_priv_s *) metal_container_of(rdev, struct rpmsg_s, rdev); if (current thread PID != rpsmg virtio rx callback thread PID) { /* Do not allow to continue to process the rx buffers recursively if * get tx payload buffer thread is not the rpmsg virtio rx callback thread. */ return -EAGAIN; } /* Wait new tx buffer */ nxsem_tickwait(&priv->semtx, MSEC2TICK(RPTUN_TIMEOUT_MS)); /* Process the remain rx buffers in rx virtquues recursively */ remoteproc_get_notification(xxx); return 0; }

So we may return -EAGAIN, but maybe we can change this value to RPMSG_EOPNOTSUPP to work around this issue. But I don't think it's the best way.

Thanks for the details.

If I well understood your issue is that you can have reentrance as you would like to treat new RX message while you are waiting for a TX buffer for a previous one.

Something that I still not understand is why do you need to enter in the loop that call metal_sleep_usec() for that?
for me this look like a workaround to fix something else.

If your thread need to sleep t, what about calling nxsig_usleep and returning 0 instead of returning -EAGAIN ?

@arnopo If the thread called rpmsg_get_tx_payload_buffer(wait = true) is not the rx thread (that call rpmsg_virtio_rx_callback()), and there is no tx buffer.
rptun_notify_wait() will return -EAGAIN and rpmsg_get_tx_payload_buffer(wait = true) will return NULL immediately with the old code, but we actually want to wait until the tx buffer returned by remote or timeout (15s).

If your thread need to sleep t, what about calling nxsig_usleep and returning 0 instead of returning -EAGAIN ?

Yes, we can return RPMSG_EOPNOTSUPP instead -EAGAIN to work around this issue, but I think RPMSG_EOPNOTSUPP means the notify_wait() callback is not implemented but we actually have implemented it. This is why I think return RPMSG_EOPNOTSUPP in rptun_notify_wait() is not the best way to solve this issue.

That means we have something not well designed here. I prefer that we take the time to develop a proper fix rather than merge something that addresses your use case but is not clean.

From my perspective we have to fix the rpmsg_virtio_notify_wait Apis.

we should pass a timeout value as parameter

while (1) { /* Lock the device to enable exclusive access to virtqueues */ metal_mutex_acquire(&rdev->lock); rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx); metal_mutex_release(&rdev->lock); if (rp_hdr || !tick_count) break; /* * Try to use wait loop implemented in the virtio dispatcher and * use metal_sleep_usec() method by default. */ - status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq); + status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq, **RPMSG_TICKS_PER_INTERVAL**); if (status == RPMSG_EOPNOTSUPP) { metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL); tick_count--; } else if (status != RPMSG_SUCCESS) { break; } }

I think it is too late to fix the APIs for this release. If you are okay with returning -RPMSG_EOPNOTSUPP instead of -EAGAIN as a workaround for this release, I propose we go with this workaround and integrate the following fix for this release:

while (1) { /* Lock the device to enable exclusive access to virtqueues */ metal_mutex_acquire(&rdev->lock); rp_hdr = rpmsg_virtio_get_tx_buffer(rvdev, len, &idx); metal_mutex_release(&rdev->lock); if (rp_hdr || !tick_count) break; /* * Try to use wait loop implemented in the virtio dispatcher and * use metal_sleep_usec() method by default. */ status = rpmsg_virtio_notify_wait(rvdev, rvdev->rvq); if (status == RPMSG_EOPNOTSUPP) { metal_sleep_usec(RPMSG_TICKS_PER_INTERVAL); tick_count--; - } else if (status == RPMSG_SUCCESS) { + } else if (status != RPMSG_SUCCESS) { break; } }

@edmooring , @tnmysh any opinion on that?

@arnopo OK, it's good to me, but could you help to update this PR because it's already very late in China and @wyr-7 has already rested. Or I can create a new PR to do this thing?
What do you think?

@arnopo I have created a new PR #624 and it's up to you to decide whether to use the new PR or update the current PR, Thanks.

arnopo · 2024-10-17T14:21:19Z

I would like to merge this one in the release as it fixes an issue

@wyr-7 could you address my comment before tomorrow code freeze?
@edmooring @tnmysh, could you review it?

Thanks in Advance

tnmysh · 2024-10-17T14:51:27Z

LGTM.

edmooring

Looks good to go.

If rpmsg_virtio_notify_wait returns RPMSG_SUCCESS, we don't call rpmsg_virtio_get_tx_buffer. Signed-off-by: Yongrong Wang <[email protected]> Signed-off-by: Bowen Wang <[email protected]>

wyr-7 · 2024-10-21T06:38:10Z

@arnopo @CV-Bowen Thanks, I have updated the latest discussion results to this PR.

wyr-7 · 2024-10-21T09:17:02Z

Yes something goes wrong with this test, just ignore it for the moment as it is relying to the Zephyr main branch that can not be stage.

Ok, thanks.

arnopo · 2024-10-21T09:19:40Z

I mergerd PR #624.

@CV-Bowen @wyr-7 you can update this one to update rpmsg_virtio_notify_wait and other API to better meet your needs

arnopo

need to be udated according to #614 (comment)

github-actions · 2024-12-21T00:32:16Z

This pull request has been marked as a stale pull request because it has been open (more than) 45 days with no activity.

wyr-7 force-pushed the remoteproc_virtio branch 2 times, most recently from 0436d97 to 61a3d85 Compare September 19, 2024 08:41