-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results better than reported in paper for UMA-VI #85
Comments
@Dylx9948 Hi, we fixed several bugs in the program and modified some parameters, which might have resulted in changes to the results. |
@xukuanHIT Thank you for the response. That then explains this. Am I correct in saying that the algorithm is non-deterministic? It seems that results differ slightly between each run. |
Yes, there might be very small differences. However, we didn't set randomness in the program, so this kind of uncertainty may stem from the nonlinear optimization. |
@xukuanHIT I understand. From the tests I have conducted, it seems as if the number of keyframes stored in the traj.txt file differs slightly (by about 4 to 5 keyframes) between runs. Is this also a result of the nonlinear optimisation? |
It's possible. The optimization can affect the pose estimation and the matching inliers, which are crucial factors in determining keyfrmaes. Besides, the feature detection and matching could also introduce uncertainty. We haven't tracked whether feature matches are exactly the same in every run. |
@xukuanHIT Yes that makes a lot of sense. Just to confirm, did you use the absolute trajectory metric to get the trajectory error? |
@xukuanHIT I am asking because I have tried to extract results using the UMA-VI provided trajectory error calculator, but it gives very large errors when compared to what I get when using simple ATE, which is more in line with the paper. Was there any specific approach you used to evaluate UMA-VI since it only has a partial ground truth? |
I have now used your included benchmarking script (that uses the Python Evo package) and the results have changed quite a lot. They are substantially larger than reported in the paper and vary widely (up to 0.6 m for conference-csc1). Please see the results for 3 runs for each sequence below: conference-csc1 -> 0.89109, 1.41234, 1.0693 |
We just use the ATE metric in EVO. Can you compare the trajectories of these 3 runs and see if they are quite different?Besides, the keyframes are very sparse, so I think you can try to use "--t_max_diff 0.1" to find more alignment poses with EVO. |
With the updated temporal alignment parameter the results still seem to follow the same distribution. Just a quick test on conference-csc1 yields: 1.0810, 0.995, 1.6754 |
OK, it seems AirVO is non-deterministic on the UMA dataset. But I'm not sure what causes it now. We will try to find the reason in the following development. Have you found similar non-deterministic results on other datasets? |
I have not been able to test on OIVIO as the dataset is no longer online. Do you have a copy of the dataset that I can use to test possibly? My email address is: [email protected] |
OK, I have uploaded the sequences to OneDrive. |
@xukuanHIT Thank you, I will download them now and run the tests. Feedback should follow soon. |
@xukuanHIT I have completed 20 runs of the TN-05-GV-01 sequence. The results are below. You will see that it is between two different sets of values. The lower RMSE occurs when 101/102 keyframes are produced and the higher when 103 keyframes are produced. There seems to be some perturbation due to randomness that causes the system to produce between 101 and 103 keyframes, leading to the non-deterministic nature of the results. The order of the results are: RMSE ATE, Min ATE, Max ATE, Mean ATE, ATE Std, Median ATE: |
Please see below the results for 10 runs for each of the OIVIO sequences. As you will see, some sequences have zero variability, while some still do vary slightly. mn015gv01_ATE.txt |
@xukuanHIT The results provided are only the RMSE ATE as reported in your paper. Each row is a new run. |
Thank you very much for your helpful results. I ran it a few times as well today, and it's true that the results have differences. We will try to find the reason. And I find the results on my computer are different from yours, maybe the version of cuda, dirver and TensortRT also have some impact on the final results. |
@xukuanHIT It is very possible that the versions lead to different results. By how much do the results differ on your computer if I may ask? |
Are you using a 30-series or 40-series Nvidia GPU? |
@xukuanHIT Would it be possible to share those result files with me? I would just like to confirm that my evaluation toolbox output is correct. What is your max_diff for Python Evo package for these datasets? |
Sure, they can be downloaded via this link. I use the default value of max_diff. |
Thank you. I have also used the same so I am sure that it must be the software versions that lead to this difference. I will check now and provide feedback. |
I have compared to my generated trajectory files and I can confirm that they do indeed slightly differ. |
Sorry to bother you, I am a beginner in SLAM and I really want to use the UMA-VI dataset for accuracy evaluation. However, I am not sure how to use the IMU data it provides. After running the results, I am unable to perform accuracy evaluation. Can you please teach me? |
Hi @ccc12134 , I can help you to sort out evaluation with the UMA-VI datatset. |
Hello, I have already sent the question to you via email. I hope you can help me when you have free time. Thank you. |
Sorry to bother you again, can I send you the results of my operation? Can you help me evaluate the accuracy? |
Hi, sorry yesterday was I public holiday where I live.
Please do send me your results and the data you used for evaluation. That will make it the easiest to help.
Best regards
Dylan Brown
… On 22 Mar 2024, at 14:43, ccc12134 ***@***.***> wrote:
Hi @ccc12134 <https://github.com/ccc12134> , I can help you to sort out evaluation with the UMA-VI datatset. Please send an email to ***@***.*** ***@***.***> describing your current evaluation approach and I'll try my best to get you on the right track.
Sorry to bother you again, can I send you the results of my operation? Can you help me evaluate the accuracy?
—
Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AV6KWFYZQGQVD3GT7TJZLRDYZQRNTAVCNFSM6AAAAAA6WCX7QCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVGAZDENJUHA>.
You are receiving this because you were mentioned.
|
I have already sent you my results, and I am very grateful for your help |
Hi, I'm sorry to bother you. I'm a beginner in SLAM. I want to test the ATE metric on the OIVIO dataset, but the ground truth of this dataset only provides 3D positions without pose information. This prevents me from directly using the evo_ape tool. Could you please advise on how to test the ATE metric in this case? Thank you very much. |
@dongdong-cai Hi, we appended 0,0,0,1 to the ground truth trajectory and used Evo to evaluate only the translation error. |
Hello, thank you so much for your work. This is a great VO system.
The results I get when comparing the GT trajectory to the estimated trajectory using ATE (after alignment) is noticeably better than what has been reported in your paper. Have changes been made in the meantime to optimize the code? I'm using an RTX4070 an thus the newer version of Cuda. Could this be the reason for the improvement?
The results I am getting vs. yours is:
conference-csc1 -> 0.2816 (vs. 0.5236)
conference-csc2 -> 0.1420 (vs 0.1607)
third-floor-csc1 -> 0.1101 (vs. 0.1760)
third-floor-csc2 -> 0.1510 (vs. 0.1312)
Please also see attached the xy plot I have extracted for the conference-csc2 sequence as a reference. As you'll see both the star t and endpoints are closer to the ground truths than shown in the Figure in the paper, as well.
The text was updated successfully, but these errors were encountered: