Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results better than reported in paper for UMA-VI #85

Open
Dylx9948 opened this issue Oct 30, 2023 · 35 comments
Open

Results better than reported in paper for UMA-VI #85

Dylx9948 opened this issue Oct 30, 2023 · 35 comments

Comments

@Dylx9948
Copy link

Hello, thank you so much for your work. This is a great VO system.

The results I get when comparing the GT trajectory to the estimated trajectory using ATE (after alignment) is noticeably better than what has been reported in your paper. Have changes been made in the meantime to optimize the code? I'm using an RTX4070 an thus the newer version of Cuda. Could this be the reason for the improvement?

The results I am getting vs. yours is:

conference-csc1 -> 0.2816 (vs. 0.5236)
conference-csc2 -> 0.1420 (vs 0.1607)
third-floor-csc1 -> 0.1101 (vs. 0.1760)
third-floor-csc2 -> 0.1510 (vs. 0.1312)

Please also see attached the xy plot I have extracted for the conference-csc2 sequence as a reference. As you'll see both the star t and endpoints are closer to the ground truths than shown in the Figure in the paper, as well.

xy_run0

@xukuanHIT
Copy link
Collaborator

@Dylx9948 Hi, we fixed several bugs in the program and modified some parameters, which might have resulted in changes to the results.

@Dylx9948
Copy link
Author

@xukuanHIT Thank you for the response. That then explains this. Am I correct in saying that the algorithm is non-deterministic? It seems that results differ slightly between each run.

@xukuanHIT
Copy link
Collaborator

Yes, there might be very small differences. However, we didn't set randomness in the program, so this kind of uncertainty may stem from the nonlinear optimization.

@Dylx9948
Copy link
Author

@xukuanHIT I understand. From the tests I have conducted, it seems as if the number of keyframes stored in the traj.txt file differs slightly (by about 4 to 5 keyframes) between runs. Is this also a result of the nonlinear optimisation?

@xukuanHIT
Copy link
Collaborator

It's possible. The optimization can affect the pose estimation and the matching inliers, which are crucial factors in determining keyfrmaes. Besides, the feature detection and matching could also introduce uncertainty. We haven't tracked whether feature matches are exactly the same in every run.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 1, 2023

@xukuanHIT Yes that makes a lot of sense. Just to confirm, did you use the absolute trajectory metric to get the trajectory error?

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 1, 2023

@xukuanHIT I am asking because I have tried to extract results using the UMA-VI provided trajectory error calculator, but it gives very large errors when compared to what I get when using simple ATE, which is more in line with the paper. Was there any specific approach you used to evaluate UMA-VI since it only has a partial ground truth?

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

I have now used your included benchmarking script (that uses the Python Evo package) and the results have changed quite a lot. They are substantially larger than reported in the paper and vary widely (up to 0.6 m for conference-csc1).

Please see the results for 3 runs for each sequence below:

conference-csc1 -> 0.89109, 1.41234, 1.0693
conference-csc2 -> 0.20904, 0.2144, 0.08809
third-floor-csc1 -> 0.10277, 0.070105, 0.2143178
third-floor-csc2 -> 0.28674, 0.172337, 0.270786

@xukuanHIT
Copy link
Collaborator

I have now used your included benchmarking script (that uses the Python Evo package) and the results have changed quite a lot. They are substantially larger than reported in the paper and vary widely (up to 0.6 m for conference-csc1).

Please see the results for 3 runs for each sequence below:

conference-csc1 -> 0.89109, 1.41234, 1.0693 conference-csc2 -> 0.20904, 0.2144, 0.08809 third-floor-csc1 -> 0.10277, 0.070105, 0.2143178 third-floor-csc2 -> 0.28674, 0.172337, 0.270786

We just use the ATE metric in EVO. Can you compare the trajectories of these 3 runs and see if they are quite different?Besides, the keyframes are very sparse, so I think you can try to use "--t_max_diff 0.1" to find more alignment poses with EVO.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

They are somewhat different. Please see attached two runs I just did for conference-csc1. First one has an ATE of 0.688 and the second 1.899. I will try changing the temporal alignment threshold now.

0 688

1 89

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

With the updated temporal alignment parameter the results still seem to follow the same distribution. Just a quick test on conference-csc1 yields:

1.0810, 0.995, 1.6754

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

Here are the 3-dimensional plots where the differences can better be seen when positioned to be at the same viewpoint.
Screenshot 2023-11-02 at 10 49 49

Screenshot 2023-11-02 at 10 49 52

@xukuanHIT
Copy link
Collaborator

OK, it seems AirVO is non-deterministic on the UMA dataset. But I'm not sure what causes it now. We will try to find the reason in the following development. Have you found similar non-deterministic results on other datasets?

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

I have not been able to test on OIVIO as the dataset is no longer online. Do you have a copy of the dataset that I can use to test possibly? My email address is: [email protected]

@xukuanHIT
Copy link
Collaborator

OK, I have uploaded the sequences to OneDrive.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

@xukuanHIT Thank you, I will download them now and run the tests. Feedback should follow soon.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 2, 2023

@xukuanHIT I have completed 20 runs of the TN-05-GV-01 sequence. The results are below. You will see that it is between two different sets of values. The lower RMSE occurs when 101/102 keyframes are produced and the higher when 103 keyframes are produced. There seems to be some perturbation due to randomness that causes the system to produce between 101 and 103 keyframes, leading to the non-deterministic nature of the results.

The order of the results are: RMSE ATE, Min ATE, Max ATE, Mean ATE, ATE Std, Median ATE:

ATE.TXT

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

Please see below the results for 10 runs for each of the OIVIO sequences. As you will see, some sequences have zero variability, while some still do vary slightly.

mn015gv01_ATE.txt
mn015gv02_ATE.txt
mn050gv01_ATE.txt
mn050gv02_ATE.txt
mn100gv01_ATE.txt
mn100gv02_ATE.txt
tn015gv01_ATE.txt
tn050gv01_ATE.txt
tn100gv01_ATE.txt

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

@xukuanHIT The results provided are only the RMSE ATE as reported in your paper. Each row is a new run.

@xukuanHIT
Copy link
Collaborator

Thank you very much for your helpful results. I ran it a few times as well today, and it's true that the results have differences. We will try to find the reason. And I find the results on my computer are different from yours, maybe the version of cuda, dirver and TensortRT also have some impact on the final results.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

@xukuanHIT It is very possible that the versions lead to different results. By how much do the results differ on your computer if I may ask?

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

Are you using a 30-series or 40-series Nvidia GPU?

@xukuanHIT
Copy link
Collaborator

The results in the paper were produced on a server with 4 RTX 3090 GPUs. Now I'm using a PC with a RTX 4070 GPU. The version of librairy is as follows:

Nvidia driver: 525.116.04
Cuda: 11.8
TensorRT 8.5.3.1

I have reset the OIVIO parameters to be the same as when I submitted the code. The new results on my PC are:
oivio

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

@xukuanHIT Would it be possible to share those result files with me? I would just like to confirm that my evaluation toolbox output is correct.

What is your max_diff for Python Evo package for these datasets?

@xukuanHIT
Copy link
Collaborator

Sure, they can be downloaded via this link. I use the default value of max_diff.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

Thank you. I have also used the same so I am sure that it must be the software versions that lead to this difference. I will check now and provide feedback.

@Dylx9948
Copy link
Author

Dylx9948 commented Nov 3, 2023

I have compared to my generated trajectory files and I can confirm that they do indeed slightly differ.

@ccc12134
Copy link

ccc12134 commented Mar 6, 2024

Sorry to bother you, I am a beginner in SLAM and I really want to use the UMA-VI dataset for accuracy evaluation. However, I am not sure how to use the IMU data it provides. After running the results, I am unable to perform accuracy evaluation. Can you please teach me?

@Dylx9948
Copy link
Author

Dylx9948 commented Mar 6, 2024

Hi @ccc12134 , I can help you to sort out evaluation with the UMA-VI datatset.
Please send an email to [email protected] describing your current evaluation approach and I'll try my best to get you on the right track.

@ccc12134
Copy link

ccc12134 commented Mar 8, 2024

Hi @ccc12134 , I can help you to sort out evaluation with the UMA-VI datatset. Please send an email to [email protected] describing your current evaluation approach and I'll try my best to get you on the right track.

Hello, I have already sent the question to you via email. I hope you can help me when you have free time. Thank you.

@ccc12134
Copy link

Hi @ccc12134 , I can help you to sort out evaluation with the UMA-VI datatset. Please send an email to [email protected] describing your current evaluation approach and I'll try my best to get you on the right track.

Sorry to bother you again, can I send you the results of my operation? Can you help me evaluate the accuracy?

@Dylx9948
Copy link
Author

Dylx9948 commented Mar 22, 2024 via email

@ccc12134
Copy link

Hi, sorry yesterday was I public holiday where I live. Please do send me your results and the data you used for evaluation. That will make it the easiest to help. Best regards Dylan Brown

On 22 Mar 2024, at 14:43, ccc12134 @.> wrote: Hi @ccc12134 https://github.com/ccc12134 , I can help you to sort out evaluation with the UMA-VI datatset. Please send an email to @. @.***> describing your current evaluation approach and I'll try my best to get you on the right track. Sorry to bother you again, can I send you the results of my operation? Can you help me evaluate the accuracy? — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV6KWFYZQGQVD3GT7TJZLRDYZQRNTAVCNFSM6AAAAAA6WCX7QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGAZDENJUHA. You are receiving this because you were mentioned.

I have already sent you my results, and I am very grateful for your help

@dongdong-cai
Copy link

论文中的结果是在一台拥有 4 块 RTX 3090 GPU 的服务器上生成的,现在我用的是一台拥有 RTX 4070 GPU 的 PC,librairy 版本如下:

Nvidia 驱动程序:525.116.04 Cuda:11.8 TensorRT 8.5.3.1

我已将 OIVIO 参数重置为与提交代码时相同。我的电脑上的新结果是: 奥伊维奥

Hi, I'm sorry to bother you. I'm a beginner in SLAM. I want to test the ATE metric on the OIVIO dataset, but the ground truth of this dataset only provides 3D positions without pose information. This prevents me from directly using the evo_ape tool. Could you please advise on how to test the ATE metric in this case? Thank you very much.

@xukuanHIT
Copy link
Collaborator

@dongdong-cai Hi, we appended 0,0,0,1 to the ground truth trajectory and used Evo to evaluate only the translation error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants