-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental setup for comparison with your work #3
Comments
Hi, thanks for your interest in our work! For the first question, as you have mentioned, we used the same Last.fm dataset as the CFM paper for objective comparison. The structure of the Last.fm dataset is: User context, Item context, ratings, timestamp. The user context is described by user ID and the last music ID that the user has listened within 90 minutes. The item context includes music ID and artist ID. It seems that the authors of CFM took all the music that the user has listened within 90 minutes as different user contexts. So there are 12265 user contexts for 1000 users in test.csv. We did not use the same procedure as Movielens for Last.fm is because we want to make a fair comparison with CFM. For the second question, we also used the same evaluation procedure as in the CFM paper. After training the model, for each user context (for Last.fm, 12265 user contexts), the metrics are computed based on the position of the test item context, and the final metrics are the average value of all user context. You can also refer to our code. |
Hi, thank you very much for your answer! Could you explain the procedure you used for the MovieLens dataset? This question because you said you used a different procedure for Last.fm and MovieLens, but in the paper you said that you used the same one. Thank you. |
Hi, I have another important question. Did you report recall or hit ratio in Table 3? Thank you! |
Hi, thanks for your question! We use the leave-one-out evaluation protocol, under this setting, recall@k is equal to hit@k. |
Hi, thank you for your answer, you are correct. Could I ask another question? 0 0 1 morning sunday weekend unknown free sunny United States 0 In fact, I'm not able to derive these fields from the features listed in the csv files. I ask this because the model I'm trying to compare with yours is really different and requires very different pre-processing. For example, it doesn't require the features. |
In fact, we downloaded the processed datasets Frappe and Last.FM. directly from CFM GitHub, so we also don't have the datasets without preprocessing. Maybe you can ask the authors of CFM for help. |
Thank you very much for the fast reply, However, I think you could provide me with the MovieLens dataset. I'm sure you have the dataset not preprocessed but already split. Could you provide me this dataset? Thank you! |
Of course! what's your email address? |
Thank you very much! E-mail: [email protected] Could you provide me with some information regarding the pre-processing you used? You can write this information directly in the e-mail. Thank you! |
Ok, we have sent the email. If you have any questions, please feel free to ask me. |
Dear researchers,
I'm Tommaso Carraro and I'm working at context-aware recommender systems for one year. I read your paper and it is so interesting. However, I would try to reproduce your experiments and I have some questions:
In 4.1.3 you said that for Last.fm and MovieLens the latest transaction of each user is held out for testing and the remaining data is treated as the training set. However, it seems that you did this split well for MovieLens but not for Last.fm. In fact, the file test.csv contains 12265 rows, but the users on Last.fm are 1000, as reported in Table 2. This means that the test.csv file should contain only 1000 interactions, according to the described procedure.
I know you took the same dataset as CFM GitHub. Also, it seems you described the evaluation procedure as in CFM paper. In the past, I tried to ask CFM researchers the same questions but no answers have arrived.
Moreover, since MovieLens is not available in CFM GitHub, you used the splitting procedure as explained in CFM paper, in order to reproduce their experiments. I think you applied the right procedure. So, my question is: "Why is there this inconsistency between the two datasets? Why did you not use the same procedure for Last.fm too?".
In 4.1.3 you said you used the leave-one-out evaluation protocol. Let us defining m as the number of the ratings of a user. With leave-one-out, you mean that for each test item you feed the m-1 user's training interactions at the network and compute metrics based on the position of the test item in the recommended list?
Thank you very much,
Tommaso Carraro
p.s. I also tried to reproduce CFM experiments using their code but the loss went to NaN after about 15 epochs of training.
The text was updated successfully, but these errors were encountered: