Experimental setup for comparison with your work #3

tommasocarraro · 2020-11-23T11:37:51Z

Dear researchers,
I'm Tommaso Carraro and I'm working at context-aware recommender systems for one year. I read your paper and it is so interesting. However, I would try to reproduce your experiments and I have some questions:

In 4.1.3 you said that for Last.fm and MovieLens the latest transaction of each user is held out for testing and the remaining data is treated as the training set. However, it seems that you did this split well for MovieLens but not for Last.fm. In fact, the file test.csv contains 12265 rows, but the users on Last.fm are 1000, as reported in Table 2. This means that the test.csv file should contain only 1000 interactions, according to the described procedure.
I know you took the same dataset as CFM GitHub. Also, it seems you described the evaluation procedure as in CFM paper. In the past, I tried to ask CFM researchers the same questions but no answers have arrived.
Moreover, since MovieLens is not available in CFM GitHub, you used the splitting procedure as explained in CFM paper, in order to reproduce their experiments. I think you applied the right procedure. So, my question is: "Why is there this inconsistency between the two datasets? Why did you not use the same procedure for Last.fm too?".
In 4.1.3 you said you used the leave-one-out evaluation protocol. Let us defining m as the number of the ratings of a user. With leave-one-out, you mean that for each test item you feed the m-1 user's training interactions at the network and compute metrics based on the position of the test item in the recommended list?

Thank you very much,
Tommaso Carraro
p.s. I also tried to reproduce CFM experiments using their code but the loss went to NaN after about 15 epochs of training.

chenchongthu · 2020-11-23T12:18:28Z

Hi, thanks for your interest in our work!

For the first question, as you have mentioned, we used the same Last.fm dataset as the CFM paper for objective comparison. The structure of the Last.fm dataset is: User context, Item context, ratings, timestamp. The user context is described by user ID and the last music ID that the user has listened within 90 minutes. The item context includes music ID and artist ID. It seems that the authors of CFM took all the music that the user has listened within 90 minutes as different user contexts. So there are 12265 user contexts for 1000 users in test.csv. We did not use the same procedure as Movielens for Last.fm is because we want to make a fair comparison with CFM.

For the second question, we also used the same evaluation procedure as in the CFM paper. After training the model, for each user context (for Last.fm, 12265 user contexts), the metrics are computed based on the position of the test item context, and the final metrics are the average value of all user context. You can also refer to our code.

tommasocarraro · 2020-11-24T08:32:25Z

Hi, thank you very much for your answer!

Could you explain the procedure you used for the MovieLens dataset? This question because you said you used a different procedure for Last.fm and MovieLens, but in the paper you said that you used the same one.
Did you take one test interaction for each user context? Or did you take one test interaction for each user independently of the context?

Thank you.

tommasocarraro · 2020-11-25T12:14:43Z

Hi, I have another important question.
In the paper you said you used Hit Ratio (hr@k) and NDCG (ndcg@k) as evaluation metrics. However, in your code you computed recall@k and ndcg@k. Specifically, you used the code available at VAE_CF to compute these metrics. This code computes the recall@k and not the hr@k. In fact, hit ratio is 1 if the target item is in the top-k recommended list, 0 otherwise.
The code available at CFM computes the hr@k well.

Did you report recall or hit ratio in Table 3?

Thank you!

chenchongthu · 2020-11-25T12:34:26Z

Hi, thanks for your question!

We use the leave-one-out evaluation protocol, under this setting, recall@k is equal to hit@k.
recall is the fraction of the target items that are successfully retrieved, it is also 1 if the target item is in the top-k recommended list since there is only one target item in the test set.

tommasocarraro · 2020-11-25T14:32:26Z

Hi, thank you for your answer, you are correct.

Could I ask another question?
Could you kindly provide the datasets without preprocessing? I mean I need your exact split but instead of the features, I would like to have the context fields not pre-processed.
For example, for Frappe I would like the following structure:

0 0 1 morning sunday weekend unknown free sunny United States 0

In fact, I'm not able to derive these fields from the features listed in the csv files. I ask this because the model I'm trying to compare with yours is really different and requires very different pre-processing. For example, it doesn't require the features.
Thank you!

chenchongthu · 2020-11-26T01:45:45Z

In fact, we downloaded the processed datasets Frappe and Last.FM. directly from CFM GitHub, so we also don't have the datasets without preprocessing. Maybe you can ask the authors of CFM for help.

tommasocarraro · 2020-11-27T10:53:51Z

Thank you very much for the fast reply,
I'm getting in touch with CFM authors.

However, I think you could provide me with the MovieLens dataset. I'm sure you have the dataset not preprocessed but already split.

Could you provide me this dataset?

Thank you!

chenchongthu · 2020-11-27T11:29:46Z

Of course! what's your email address?

tommasocarraro · 2020-11-27T11:46:17Z

Thank you very much!

E-mail: [email protected]

Could you provide me with some information regarding the pre-processing you used? You can write this information directly in the e-mail.

Thank you!

chenchongthu · 2020-11-27T14:08:06Z

Ok, we have sent the email. If you have any questions, please feel free to ask me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental setup for comparison with your work #3

Experimental setup for comparison with your work #3

tommasocarraro commented Nov 23, 2020

chenchongthu commented Nov 23, 2020

tommasocarraro commented Nov 24, 2020 •

edited

Loading

tommasocarraro commented Nov 25, 2020

chenchongthu commented Nov 25, 2020

tommasocarraro commented Nov 25, 2020

chenchongthu commented Nov 26, 2020

tommasocarraro commented Nov 27, 2020

chenchongthu commented Nov 27, 2020

tommasocarraro commented Nov 27, 2020

chenchongthu commented Nov 27, 2020

Experimental setup for comparison with your work #3

Experimental setup for comparison with your work #3

Comments

tommasocarraro commented Nov 23, 2020

chenchongthu commented Nov 23, 2020

tommasocarraro commented Nov 24, 2020 • edited Loading

tommasocarraro commented Nov 25, 2020

chenchongthu commented Nov 25, 2020

tommasocarraro commented Nov 25, 2020

chenchongthu commented Nov 26, 2020

tommasocarraro commented Nov 27, 2020

chenchongthu commented Nov 27, 2020

tommasocarraro commented Nov 27, 2020

chenchongthu commented Nov 27, 2020

tommasocarraro commented Nov 24, 2020 •

edited

Loading