-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validating flowMC Results on Unlabeled Data #177
Comments
@kazewong We are using flowmc for our project. It gives us results quickly. However, we need to learn how to choose the values of the following parameters to ensure that we are getting the optimal results. In ML, we can look at training and validation loss plots for decision-making. But in the case of flowMC, Each parameter combination changes the results, so is there a way to get confidence in our results? For instance, plots or any reference values to look at?
We would appreciate it if you could explain or point us to resources to read about them before making the decision about parameter selection. Different parameter choices give us different solutions, which causes uncertainty about the results. |
Without a general sense of the geometry of the target space, it is quite difficult to give detailed advice on how to tune flowMC. Different hyper-parameters should give you different results, in the end, that's why we have the hyperparameters. That said, there are a couple of numbers one can tune:
|
@kazewong, by unlabeled data I meant the actual GWTC data. We are running our pipelines on synthetic data, and they are working well. We wanted a systematic way/threshold of any quantity with which we can gather the confidence in the analysis that has been done. Also, I wanted to address one issue that on single GPU we cannot perform analysis with big values. On a GPU with 16GB VRAM we are able to set at most 60 global steps otherwise the program crashes. You mentioned in this comment that flowMC is not utilizing multiple GPU. We bump into this problem most of the time and there is nothing we could do about it. I wanted to know if scaling flowMC to utilize multiple GPU is in your priority. |
Effective sample could be a useful measure https://python.arviz.org/en/stable/examples/plot_ess_evolution.html Note that all the diagnostics I know of can only diagnose when there is an issue, but none of them will give you a sense of how "correct" you are, i.e. how biased the result is. On the note of multiple GPU, I think if you are reaching your memory limit with the parameters mentioned above, it may be worth looking into how to compress the memory footprint before going to multiple GPUs, for example downsampling the number of samples for each event.. Jax does have distributed computing capability, but I think it is not the highest priority for flowMC now. I also doubt scaling to multiple GPU is going to be an efficient solution to your problem, especially you may have to deal with inter-GPU communication optimization |
I have thought of the same thing! Do update us if you get to it.
I was talking about the multi-device parallelism that is explained in a tutorial on SPMD multi-device parallelism with
I tried to write custom kernels in CUDA and Pallas, and it was overcomplicating a simple problem. |
Custom kernels in CUDA and Pallas probably won't solve the problem. The goal of a custom kernel is usually to reduce round trips to/from the GPU memory, but ultimately if your data is larger than the memory the GPU has, it won't solve your issue. I am aware of Jax's capability in distributed computing, both within a node and across nodes. The main reason I suggest you compress the memory footprint is using multiple GPUs is not without cost. Communication between GPUs takes time, and it is usually the biggest bottleneck in almost every application. I think if your GPU has 16GB of ram and you are bottlenecked by memory, having, say, 4 GPUs only gets you 4 times as far. Unless you have a large GPU cluster, otherwise investing time in distributed computation here may not solve your problem. That said, it might be good for flowMC to be able to handle distributed workflow at some point. This should probably come only after the API really stabilized. |
An observation I wanted to share. I don't know the reason behind this behaviour. Memory footprint drastically (in GBs) increases when we slightly increase the number of global steps. This doesn't happen with the number of local steps. |
I understood the local acceptance rate should be around 0.2-0.8, but could you please explain the second check? |
The idea behind global_acceptance * number of global steps should be at least 1 is the global proposal generates completely uncorrelated samples, should they be accepted. It would be useful to have at least 1 effective sample per cycle. We are working on formalizing this and improve the tuning based on this |
Hi there, I'm running some analysis on unlabeled data and I'm wondering how I can make sure the results are accurate. Since there's no ground truth or labeled information, it's tricky to evaluate the performance of the analysis.
The text was updated successfully, but these errors were encountered: