WordCloud and Cluster #600
Replies: 1 comment 6 replies
-
Hi @kennynakamura, all good. Some years ago I thought about a similar feature to extract relevant words from texts and store them in a "relevant words" column. Text summarization is another related idea. I think you can use the item.setExtraAttribute() method like you did in #508 to store keywords extracted from a specific document. It is possible to store images using the method above, but they won't be shown automatically in UI. To create a new viewer, java code needs to be implemented. But I think a simple new column with relevant words is already a good start. About clustering documents, a python task could create document vectors in process() and store them using method above, then they could be retrieved in task finish() method to run the clustering algorithm. Actually, I think word vectors already exist in lucene index and could be reused instead of recomputing them again and wasting resources. To save the cluster number to which a document belongs, currently it can only be saved as a new bookmark. After #24, we will able to create custom columns in finish() method or even after processing ends. |
Beta Was this translation helpful? Give feedback.
-
Hello! me again, all good? :)
I had an idea to generate wordcloud from cluster chats, to have a preview of the subjects clusters.
For example it could separate personal and professional conversations.
I have a code to generate the cluster and wordcloud, but to do this, I need to create a variable where all texts are stored.
Can python scripts do this? In the tests I'm doing, the scripts act only on one item at a time. And could i generate an image and where could i store it?
Beta Was this translation helpful? Give feedback.
All reactions