Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TripleStoreKnowledgeBase vs TripleStore #451

Open
Demirrr opened this issue Oct 16, 2024 · 9 comments
Open

TripleStoreKnowledgeBase vs TripleStore #451

Demirrr opened this issue Oct 16, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@Demirrr
Copy link
Member

Demirrr commented Oct 16, 2024

Shall we combine these classes into one ?

@Demirrr Demirrr added the question Further information is requested label Oct 16, 2024
@Demirrr
Copy link
Member Author

Demirrr commented Oct 17, 2024

KnowledgeBase implements the following computation in the initialization

        self.ind_set = init_individuals_from_concepts(include_implicit_individuals,
                                                      reasoner=self.reasoner,
                                                      ontology=self.ontology,
                                                      individuals_per_concept=(self.individuals(i) for i in
                                                                               self.get_concepts()))

see here
This implies that we compute individuals for each concepts.
This leads to a bottleneck on large knowledge graph.
Since TripleStoreKnowledgeBase inherits from KnowledgeBase, we cannot use it as it is on large knowledge graphs.
Yet, TripleStore works like charm but it does not implement few functions that KnowledgeBase implements.

We need to think about bringing them together @alkidbaci @LckyLke

@Demirrr Demirrr added the enhancement New feature or request label Oct 17, 2024
@alkidbaci
Copy link
Collaborator

Yes I do agree that a refactoring is needed here and we should have a single triple store KB class. The suggested combined class should ideally still inherit from KnowledgeBase so that we can use our CEL algos without any extra changes and it needs to be fast like TripleStore. Its been a while since we implemented this but there are some clear differences in their structure. I suggest evaluating the possible merging points first and try to come up with a merging solution that satisfies all objectives. I can have a better look on this after I finish some other tasks but @LckyLke please feel free to take this one if you think of a solution before me.

@alkidbaci
Copy link
Collaborator

Should consider finding a solution for #446 also

@LckyLke
Copy link
Collaborator

LckyLke commented Oct 17, 2024

Is there any reason why the ind_set is not just simply initialized like this?:

self.ind_set = frozenset(self.ontology.individuals_in_signature())

@Demirrr
Copy link
Member Author

Demirrr commented Oct 17, 2024

Is there any reason why the ind_set is not just simply initialized like this?:

self.ind_set = frozenset(self.ontology.individuals_in_signature())

Not that I am aware of 😀

@LckyLke
Copy link
Collaborator

LckyLke commented Oct 23, 2024

To understand the code base of these classes better, I wrote a small script to visualize the common and unique methods of the classes so we can identify possible merging points etc.
Maybe this is helpful for someone else as well :)
class_hierarchy_enhanced.gv.pdf
→ u have to scroll down for the relevant part

@LckyLke
Copy link
Collaborator

LckyLke commented Oct 24, 2024

Also, some functions with the same name in these classes operate differently:

If you look at get_all_sub_concepts from TripleStoreKnowledgeBase for instance:

def get_all_sub_concepts(self, concept: OWLClassExpression) -> Iterable[OWLClassExpression]:
        assert isinstance(concept, OWLClass)
        yield from self.reasoner.sub_classes(concept, direct=False)

and from TripleStore:

    def get_all_sub_concepts(self, concept: OWLClass, direct=True):
        yield from self.reasoner.subconcepts(concept, direct)

one has direct to false and you cant change it, and the other has it to true by default but it is changeable.
-> so which behaviour is the one we should keep when merging?

@alkidbaci
Copy link
Collaborator

So I was thinking a bit about this and this is my proposed solution:

First Part

We have this class ontolearn.abstracts.AbstractKnowledgeBase that KnowledgeBase inherit from.
Now, since we dont want the TripleStore to inherit from KnowledgeBase because their implementation should be different (for example TripleStore should not use cache etc) then we make TripleStore inherit from AbstractKnowledgeBase.
But AbstractKnowledgeBase has very few abstract methods to implement (this is not a problem until the Third part). That means we must populate AbstractKnowledgeBase with other abstract methods (naming copied from KnowledgeBase). This class should contain only those methods that must be implemented by both KnowledgeBase and TripleStore.

Second Part

With that structure set to place, we can then merge TripleStoreKnowledgeBase with TripleStore. To do that, we remove one of them lets say TripleStoreKnowledgeBase and make TripleStore inherit from AbstractKnowledgeBase. Methods that do not conflict (extra methods so to day) are moved from TripleStoreKnowledgeBase to TripleStore. We keep TripleStoreReasoner and TripleStoreOntology which will be set as values to resoner and ontolgoy attributes of this new TripleStore class . The next step is removing TripleStoreReasonerOntology. Every method of that class can be transferred to either one of [TripleStore, TripleStoreReasoner or TripleStoreOntology].

Third Part

Since our CEL algorithms currently accept as an argument only KnowledgeBase and some other accept KnowledgeBase | TripleStore, we need to set a standard to make it easier to use CELs with every possible implementation of AbstractKnowledgeBase. Our newly designed TripleStore now inherits from AbstractKnowledgeBase, we need to change the type of these accepted argument in CEL algos to AbstractKnowledgeBase and you can freely use either KnowledgeBase or TripleStore` in every of them.

Other remarks

I am currently handling some comments and TODOs in the ontolearn.knowledge_base.py module which should either way be taken care off whether this suggestion is put to action or not and if we decide to go with my suggestion I will not be able to implement this until my vacations are over. But nevertheless, I would like to focus on this as soon as I back to work, to solve this problem once and for all ^_^ .

I would like to know your opinions and of course if you have any other suggestions.

P.S: I may work on this for the next few Mondays (since I have taken no vacation on Mondays) if I get the green line to continue with it by @Demirrr and if there is no other prioritized work to do.

@Demirrr
Copy link
Member Author

Demirrr commented Dec 10, 2024

First Part

Agreed!

Second Part

Mostly agreed. Currently, TripleStoreKnowledgeBase does not current work on large knowledge graphs.
This issue stems from the following inheritance:

class TripleStoreKnowledgeBase(KnowledgeBase):

. Therefore, I do not think that TripleStoreKnowledgeBase should play any role in our plan for the time being.
Hence, you can do whatever you like with it :)

Third Part

Agreed!

Plan

  • Write AbstractKnowledgeBase in abstracts.py. This abstract class includes the abstract methods.
  • TripleStore inherits from AbstractKnowledgeBase

For the sake of efficient programming and writing tests, we can do the following as well

In the construct of TripleStore, we should have path:str and url:str. If path is given, then we can use rdflib to read the given RDF knowledge graph into memory. The advantage is we can use SPARQL see the tutorial. By doing so, we can write unit tests for TripleStore and you do not need to handle managing a triple store instance (although it is not a much effort)

@Demirrr Demirrr removed the question Further information is requested label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants