Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental setup for comparison with your work #3

Open
tommasocarraro opened this issue Nov 23, 2020 · 10 comments
Open

Experimental setup for comparison with your work #3

tommasocarraro opened this issue Nov 23, 2020 · 10 comments

Comments

@tommasocarraro
Copy link

Dear researchers,
I'm Tommaso Carraro and I'm working at context-aware recommender systems for one year. I read your paper and it is so interesting. However, I would try to reproduce your experiments and I have some questions:

  1. In 4.1.3 you said that for Last.fm and MovieLens the latest transaction of each user is held out for testing and the remaining data is treated as the training set. However, it seems that you did this split well for MovieLens but not for Last.fm. In fact, the file test.csv contains 12265 rows, but the users on Last.fm are 1000, as reported in Table 2. This means that the test.csv file should contain only 1000 interactions, according to the described procedure.
    I know you took the same dataset as CFM GitHub. Also, it seems you described the evaluation procedure as in CFM paper. In the past, I tried to ask CFM researchers the same questions but no answers have arrived.
    Moreover, since MovieLens is not available in CFM GitHub, you used the splitting procedure as explained in CFM paper, in order to reproduce their experiments. I think you applied the right procedure. So, my question is: "Why is there this inconsistency between the two datasets? Why did you not use the same procedure for Last.fm too?".

  2. In 4.1.3 you said you used the leave-one-out evaluation protocol. Let us defining m as the number of the ratings of a user. With leave-one-out, you mean that for each test item you feed the m-1 user's training interactions at the network and compute metrics based on the position of the test item in the recommended list?

Thank you very much,
Tommaso Carraro
p.s. I also tried to reproduce CFM experiments using their code but the loss went to NaN after about 15 epochs of training.

@chenchongthu
Copy link
Owner

Hi, thanks for your interest in our work!

For the first question, as you have mentioned, we used the same Last.fm dataset as the CFM paper for objective comparison. The structure of the Last.fm dataset is: User context, Item context, ratings, timestamp. The user context is described by user ID and the last music ID that the user has listened within 90 minutes. The item context includes music ID and artist ID. It seems that the authors of CFM took all the music that the user has listened within 90 minutes as different user contexts. So there are 12265 user contexts for 1000 users in test.csv. We did not use the same procedure as Movielens for Last.fm is because we want to make a fair comparison with CFM.

For the second question, we also used the same evaluation procedure as in the CFM paper. After training the model, for each user context (for Last.fm, 12265 user contexts), the metrics are computed based on the position of the test item context, and the final metrics are the average value of all user context. You can also refer to our code.

@tommasocarraro
Copy link
Author

tommasocarraro commented Nov 24, 2020

Hi, thank you very much for your answer!

Could you explain the procedure you used for the MovieLens dataset? This question because you said you used a different procedure for Last.fm and MovieLens, but in the paper you said that you used the same one.
Did you take one test interaction for each user context? Or did you take one test interaction for each user independently of the context?

Thank you.

@tommasocarraro
Copy link
Author

Hi, I have another important question.
In the paper you said you used Hit Ratio (hr@k) and NDCG (ndcg@k) as evaluation metrics. However, in your code you computed recall@k and ndcg@k. Specifically, you used the code available at VAE_CF to compute these metrics. This code computes the recall@k and not the hr@k. In fact, hit ratio is 1 if the target item is in the top-k recommended list, 0 otherwise.
The code available at CFM computes the hr@k well.

Did you report recall or hit ratio in Table 3?

Thank you!

@chenchongthu
Copy link
Owner

Hi, thanks for your question!

We use the leave-one-out evaluation protocol, under this setting, recall@k is equal to hit@k.
recall is the fraction of the target items that are successfully retrieved, it is also 1 if the target item is in the top-k recommended list since there is only one target item in the test set.

@tommasocarraro
Copy link
Author

Hi, thank you for your answer, you are correct.

Could I ask another question?
Could you kindly provide the datasets without preprocessing? I mean I need your exact split but instead of the features, I would like to have the context fields not pre-processed.
For example, for Frappe I would like the following structure:

0 0 1 morning sunday weekend unknown free sunny United States 0

In fact, I'm not able to derive these fields from the features listed in the csv files. I ask this because the model I'm trying to compare with yours is really different and requires very different pre-processing. For example, it doesn't require the features.
Thank you!

@chenchongthu
Copy link
Owner

In fact, we downloaded the processed datasets Frappe and Last.FM. directly from CFM GitHub, so we also don't have the datasets without preprocessing. Maybe you can ask the authors of CFM for help.

@tommasocarraro
Copy link
Author

Thank you very much for the fast reply,
I'm getting in touch with CFM authors.

However, I think you could provide me with the MovieLens dataset. I'm sure you have the dataset not preprocessed but already split.

Could you provide me this dataset?

Thank you!

@chenchongthu
Copy link
Owner

Of course! what's your email address?

@tommasocarraro
Copy link
Author

Thank you very much!

E-mail: [email protected]

Could you provide me with some information regarding the pre-processing you used? You can write this information directly in the e-mail.

Thank you!

@chenchongthu
Copy link
Owner

Ok, we have sent the email. If you have any questions, please feel free to ask me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants