How do I get the graph on untrained data? #615

lnajman · 2021-03-12T15:30:50Z

lnajman
Mar 12, 2021

I want to use UMap jointly with a classifier that needs a graph as input. It works well on training data, as it is straightforward to get the graph from the mapper. However, for test data, I only get the embedding of the points, not the graph. I have seen that you can select transform_mode = 'graph' to get a graph, but I do not understand what is the output graph on the test data, and how is it related to the graph computed on the train data.
I found the transform_mode by reading the code, but I can not find a documentation for that. It is currently not explained in the doc.
Is there a documentation that helps me to understand both what UMAP is exactly doing for generalization, and also how to get the graph it is already computing at generalization, so that I do not have to compute this graph by myself?
Ideally, I would like to obtain a graph that contains both training and testing data, but I am not sure if this is what the current code is doing.

lmcinnes · 2021-03-12T20:40:00Z

lmcinnes
Mar 12, 2021
Maintainer

The current approach builds a graph where vertices are training samples, and that can be extracted as the graph_ attribute. The transform method with transform_mode = "graph" returns a sparse matrix in the same format, but just for the transformed data -- so each column corresponds to a training sample, and each row corresponds to a sample that was passed to transform. Each entry is the weight of the edge between the test sample and training sample corresponding to the rows and columns for the entry.

The important point here is the contrast between train and transformed samples: train samples can have edges joining them, but transform samples only have edges to training samples. If you want to get an adjacency matrix for the combined train and test data then you can effectively do:

train_graph = mapper.graph_
test_graph = mapper.transform(test_data)
full_adjacency = scipy.sparse.vstack(
    [
        scipy.sparse.hstack([train_graph, test_graph.transpose()]),
        scipy.sparse.hstack([test_graph, scipy.sparse.csr_matrix((test_graph.shape[0], test_graph.shape[0]))
    ]
)

All we are doing here is putting together all the blocks. If the train graph matrix is A, and the test graph matrix is B then we are simply constructing the block-wise matrix

--------------------
|           |       |
|           |       |
|     A     |   B'  |
|           |       |
|           |       |
---------------------
|           |       |
|     B     |   0   |
|           |       |
---------------------

where 0 is simply the all zero matrix (since there are no edges among the test samples).

2 replies

lnajman Mar 13, 2021
Author

Thanks for the quick and very clear answer.
I have just a small questions: for a given test/transform sample, how many edges to trained samples do you compute?
Is this a fixed number, maybe just the closest one?

lmcinnes Mar 13, 2021
Maintainer

It should be exactly n_neighbors many edges from each test sample.

vishnu1729 · 2024-05-16T22:42:14Z

vishnu1729
May 16, 2024

can we only get a graph for the test data if we use transform_mode = graph while fitting the model? I need the embeddings for train and test as well as the graph for the test data. Kindly let me know how i can get both these.

0 replies

lmcinnes · 2024-05-16T23:42:05Z

lmcinnes
May 16, 2024
Maintainer

Unfortunately yes, that's the case right now. Sorry.

…

On Thu, May 16, 2024 at 6:42 PM Vishnu Muralidharan < ***@***.***> wrote: can we only get a graph for the test data if we use transform_mode = graph while fitting the model? I need the embeddings for train and test as well as the graph for the test data. Kindly let me know how i can get both these. — Reply to this email directly, view it on GitHub <#615 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3IUBMTXDIKY622WQ6SJK3ZCUY5XAVCNFSM4ZCPWC52U5DIOJSWCZC7NNSXTOKENFZWG5LTONUW63SDN5WW2ZLOOQ5TSNBWGM2DIMI> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

vishnu1729 May 17, 2024

thanks! so i think i wll have to two separate UMAP models: one for graphs and one for embeddings. Appreciate the inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I get the graph on untrained data? #615

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How do I get the graph on untrained data? #615

lnajman Mar 12, 2021

Replies: 3 comments · 3 replies

lmcinnes Mar 12, 2021 Maintainer

lnajman Mar 13, 2021 Author

lmcinnes Mar 13, 2021 Maintainer

vishnu1729 May 16, 2024

lmcinnes May 16, 2024 Maintainer

vishnu1729 May 17, 2024

lnajman
Mar 12, 2021

Replies: 3 comments 3 replies

lmcinnes
Mar 12, 2021
Maintainer

lnajman Mar 13, 2021
Author

lmcinnes Mar 13, 2021
Maintainer

vishnu1729
May 16, 2024

lmcinnes
May 16, 2024
Maintainer