In this project, we use text mining techniques to mine forum data and create a search system that allows users to get insight in forum posts (either Dutch or English)

Problem Context

Forum posts contain a lot of valuable information. Sometimes, however, it can be hard to find that information in the multitude of posts available. One context in which this problem becomes relevant is for cancer patients. Patients with various forms of cancer use forums to exchange information with others (‘informational support’) as well as discuss their emotions (‘emotional support’). For some patients, these online patient groups and discussions boards can help empower them, as they cope with their illness.


To aid people with various types of cancer (such as GIST and hemato-oncological types) in navigating online fora, natural language processing can be used. State-of-the-art natural language processing techniques can be used to extract entities (possible relevant nouns and verbs) from online fora. By using co-occurrence counts, an entity graph can be created. This can be combined with a smart (ElasticSearch) database and a search function to find the related posts for any query. In this project, specifically, a state-of-the-art summarization technique is used that was developed by the University Leiden. The technique takes the queries from users and uses them to select only the relevant sentences from online fora. A visualization is also developed to show all the functionalities properly.


In this project, a website was created to help users find relevant posts. In the designed website, users can search through forum posts through a search function. Additionally, they can gain insights into the posts by using a graph with related terms and their relations. The website was created specifically for cancer patients, offering insights into the English forum GIST and multiple Dutch fora for patients diagnosed with hemato-oncological cancer types. The website is validated by experts in the field. The website is still available, but it is locked with a password to keep track of the users.


