about your dataset #11

DeathYmz · 2020-07-30T07:39:58Z

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

lt15523290043 · 2021-05-15T12:30:07Z

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?
hello,Do you know how to generate the files now?

DeathYmz · 2021-05-15T12:44:46Z

Okay, time has passed for a long time, and I forgot how to debug it.I can post some code that I have processed, I hope it will be useful to you     train_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/train_id.pickle", 'rb'))     val_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/validate_id.pickle", 'rb'))     stop_words = process_data_weibo.stopwordslist()      pre_path = 'F:/data/EANN-KDD18-master/Data/weibo/tweets/'     file_list = [pre_path + "test_nonrumor.txt", pre_path + "test_rumor.txt", \                          pre_path + "train_nonrumor.txt", pre_path + "train_rumor.txt"]     nonrumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/')     rumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/rumor_images/')     #train     for k,f in enumerate(file_list):         f = open(f,encoding='utf-8')         if (k + 1) % 2 == 1:             label = 0  ### real is 0         else:             label = 1  ####fake is 1         lines = f.readlines()         post_id = ""         url = ""         for i, line in enumerate(lines):             if (i+1)%3 ==1 :                 post_id = line.split('|')[0]             if (i+1)%3 ==2:                 url = (line.lower())             if (i+1)%3 ==0:                 line = process_data_weibo.clean_str_sst(line)                 seg_list = jieba.cut_for_search(line) #中文分词                 new_seg_list = []                 for word in seg_list:                     if word not in stop_words:                         new_seg_list.append(word)                 clean_l = ' '.join(new_seg_list)                 if len(clean_l) > 10 and post_id in train_id:                     describe = []                     for x in new_seg_list:                         if x not in word2ix:                             word2ix[x] = wordcnt                             ix2word[wordcnt] = x                             wordcnt += 1                         describe.append(word2ix[x])                     max_seq_len =max(max_seq_len,len(describe))                     event = int(train_id[post_id])                     max_event = max(max_event,event)                     for x in url.split('|'):                         image_id = x.split('/')[-1].split(".")[0]                         if label==0 and image_id in nonrumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id]                             data.append([describe,image_url,label,event])                         elif label==1 and image_id in rumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id]                             data.append([describe,image_url,label,event])                              elif len(clean_l) > 10 and post_id in val_id:                     describe = []                     for x in new_seg_list:                         if x not in word2ix:                             word2ix[x] = wordcnt                             ix2word[wordcnt] = x                             wordcnt += 1                         describe.append(word2ix[x])                      max_seq_len =max(max_seq_len,len(describe))                      event = int(val_id[post_id])                     max_event = max(max_event,event)                     for x in url.split('|'):                         image_id = x.split('/')[-1].split(".")[0]                         if label==0 and image_id in nonrumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id]                             val_data.append([describe,image_url,label,event])                         elif label==1 and image_id in rumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id]                             val_data.append([describe,image_url,label,event])

…

------------------ 原始邮件 ------------------ 发件人: ***@***.***>; 发送时间: 2021年5月15日(星期六) 晚上8:30 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lt15523290043 · 2021-05-15T13:07:19Z

Thank you very much for your reply. Could you tell me the relevant codes of this file of w2v.pickle

lt15523290043 · 2021-05-16T02:08:47Z

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

DeathYmz · 2021-05-16T02:20:36Z

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each def get_data(text_only):     #text_only = False     if text_only:         print("Text only")         image_list = []     else:         print("Text and image")         image_list = read_image()     train_data = write_data("train", image_list, text_only)     valiate_data = write_data("validate", image_list, text_only)     test_data = write_data("test", image_list, text_only)     print("loading data...")     # w2v_file = '../Data/GoogleNews-vectors-negative300.bin'     vocab, all_text = load_data(train_data, valiate_data, test_data)     # print(str(len(all_text)))     print("number of sentences: " + str(len(all_text)))     print("vocab size: " + str(len(vocab)))     max_l = len(max(all_text, key=len))     print("max sentence length: " + str(max_l))     word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle"     w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes')     # print(w2v)     # input("w2v over")     print("word2vec loaded!")     print("num words already in word2vec: " + str(len(w2v)))          add_unknown_words(w2v, vocab)     W, word_idx_map = get_W(w2v)     # # rand_vecs = {}     # # add_unknown_words(rand_vecs, vocab)     W2 = rand_vecs = {}     w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb")     pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file)     w_file.close()     return train_data, valiate_data, test_data

…

------------------ 原始邮件 ------------------ 发件人: ***@***.***>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: ***@***.***>; 抄送: ***@***.***>; ***@***.***>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

lt15523290043 · 2021-05-24T11:06:41Z

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each def get_data(text_only): #text_only = False if text_only: print("Text only") image_list = [] else: print("Text and image") image_list = read_image() train_data = write_data("train", image_list, text_only) valiate_data = write_data("validate", image_list, text_only) test_data = write_data("test", image_list, text_only) print("loading data...") # w2v_file = '../Data/GoogleNews-vectors-negative300.bin' vocab, all_text = load_data(train_data, valiate_data, test_data) # print(str(len(all_text))) print("number of sentences: " + str(len(all_text))) print("vocab size: " + str(len(vocab))) max_l = len(max(all_text, key=len)) print("max sentence length: " + str(max_l)) word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle" w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes') # print(w2v) # input("w2v over") print("word2vec loaded!") print("num words already in word2vec: " + str(len(w2v))) add_unknown_words(w2v, vocab) W, word_idx_map = get_W(w2v) # # rand_vecs = {} # # add_unknown_words(rand_vecs, vocab) W2 = rand_vecs = {} w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb") pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file) w_file.close() return train_data, valiate_data, test_data
…
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

I used Weibo files and it worked ,but I don't know how to use Twitter data.Please tell me whether you useTwitter files for your experiment?

youran521 · 2021-12-23T01:26:59Z

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?Thank you!

Dxy-cpu · 2022-01-23T06:22:14Z

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

hello，I have the same problem.could you please tell me how should I do?

Dxy-cpu · 2022-01-23T06:24:02Z

my email adress is [email protected]

balabalacc · 2023-05-19T07:26:13Z

I have the same problem. Can you tell me how to solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about your dataset #11

about your dataset #11

DeathYmz commented Jul 30, 2020

lt15523290043 commented May 15, 2021

DeathYmz commented May 15, 2021 via email

lt15523290043 commented May 15, 2021

lt15523290043 commented May 16, 2021

DeathYmz commented May 16, 2021 via email

lt15523290043 commented May 24, 2021

youran521 commented Dec 23, 2021

Dxy-cpu commented Jan 23, 2022

Dxy-cpu commented Jan 23, 2022

balabalacc commented May 19, 2023

about your dataset #11

about your dataset #11

Comments

DeathYmz commented Jul 30, 2020

lt15523290043 commented May 15, 2021

DeathYmz commented May 15, 2021 via email

lt15523290043 commented May 15, 2021

lt15523290043 commented May 16, 2021

DeathYmz commented May 16, 2021 via email

lt15523290043 commented May 24, 2021

youran521 commented Dec 23, 2021

Dxy-cpu commented Jan 23, 2022

Dxy-cpu commented Jan 23, 2022

balabalacc commented May 19, 2023