-
Notifications
You must be signed in to change notification settings - Fork 1
Roadmap
MariusDanner edited this page Mar 25, 2020
·
6 revisions
Though the pipeline is fully usable as-is, there are some features in our backlog that could serve as ideas for future enhancements. Here they are, in no particular order:
- Backend
- C++ as execution environment. This would involve creating an equivalent to
mpci_utils.r
for C++ and set up a Dockerfile according to the requirements laid out here.- Once C++ support is implemented, it would be nice to add a faster, parallelized version of PC from this repo. Ideally one would first implement discrete conditional independence tests for this project and embed it as linked library.
- It would be nice to implement support for prior knowledge using the fixed_edges and fixed_gaps parameters of pcalg. This would include the feature to create a new experiment based on the annotations of the validated experiment. Therefore the EdgeInformations
missing
andapproved
need to converted to fixed_edges anddeclined
to fixed_gaps. - To allow a different type of prior knowledge, it might be interesting to group different nodes together. This would allow custom edge orientation rules, e.g., if a certain group of nodes cannot be the effect of a different group of nodes because it always happened before.
- There is an existing endpoint picking out notable edges according to edge weight, finding notable paths would be interesting as well.
- One could periodically save intermediate results for long computations. For example, run pcalg with m.max=1 and return intermediate result, than run it with m.max=2 and the already known fixed_gaps, and so on.
- Merge the redundant features of
is_ground_truth
andEdgeInformation
. EdgeInformation offers more functionality and it might make sense to remove theis_ground_truth
property ofEdges
and integrate the Ground Truth upload and the comparison metrics into the EdgeInformation feature - The caching of dataset metadata could be done as properties of the dataset model
- Make scheduling more advanced. Currently, sequential jobs can only run, if no other job is currently running in the whole environment (kubernetes namespace). This has several drawbacks: When there is more than one server, there could still be only one job at a time. There can be jobs running on the server in other namespaces. So make the scheduling server specific and try to find a way to block the server for all namespaces.
- C++ as execution environment. This would involve creating an equivalent to
- Frontend
- When viewing datasets, one could display a preview of the observation matrix by loading the first N rows of a dataset and putting them into a table element. Then it might make sense to allow the user to group the different columns (Nodes) together, to help him navigating in the graph exploration and defining prior knowledge (edge orientation rules).
- When displaying interventions of relationships with bidirectional confounders, one valid set of parents within the equivalence class is selected to perform the intervention. One could try to think of ways to display these sets intuitively in order to perform separate interventions.
- Home
- Setup
- API Documentation
- Data model
- How to...
- ...add a new remote database
- ...create migrations
- ...develop locally
- ...simplified dev setup
- ...deploy on kubernetes (internal)
- ...monitor the cluster
- ...fix expired K8s certificates
- ...add new dependencies
- ...rebuild base image
- ...add a new algorithm in R
- ...add support for new language
- ...add a new node to Kubernetes cluster
- Coding Conventions
- Roadmap
- Ownership