Facebook Pixel

Massive text analysis using NLP for key Insights extraction


Company description

The Ministry is responsible for a number of policy areas which are important for the general business environment, including business regulation, Intellectual Property Rights, competition and consumer policy, the financial sector and shipping. The ministry comprises seven individual agencies, and employs around 2000 public- administrators and officials.

The Ministry is engaged with various international organizations enhancing international framework conditions for growth. In the European Union, the Ministry participates in the work of the councils for Competitiveness and Maritime Transport.

The main mission of the Ministry of Industry, Business and Financial Affairs is to create competitive and innovative conditions for growth. Our vision is Europe’s best conditions for doing business.

Project description

When a new piece of legislation is proposed, relevant stakeholders consisting of subject matter experts, organizations and civilians are typically encouraged to voice their opinions, to ensure the feasibility of given proposed legislation. 
This public hearing is an integral part of the Danish democratic system, as it allows affected parties the potential to influence a new piece of legislation before it comes into effect.

The public hearing procedure produces large amounts of text (hearing answers), consisting of high-level detailing and evaluation of law proposals.
Currently, the consolidation and analysis of these hearing answers are characterized by manual, labour-intensive work. As natural language processing- and machine learning methodologies have gained increased traction in the past decade, the Danish Ministry of Industry, Business and Financial Affairs are curious of the applications of these new technologies and methodologies to support the ministry’s operations, in regards to hearing answers.

Specifically, we would like you to:

  • Look into the possibilities of utilizing NLP concepts and tools such as e.g. topic classification and sentiment analysis to gain insights that support the ministry’s current operations, in regards to hearing answers.
  • Produce an information-rich and interactive dashboard, that conveys key insights about hearing answers for any given legislation proposal.
  • Reflect on data- management and governance requirements that can enable a scaling and full implementation of the analytics procedure and dashboard.
The ideal outcome of the thesis would be a tested prototype that lessens the workload of the analysis and consolidation of hearing answers, by offering insights from the textual data through visualizations.

Although the topic mainly concerns data analytics, the task does require a focus on the entire data pipeline, from data- acquisition, cleaning, analysis and communication, and therefore, depending on the scope of the thesis, it is imaginable that a thesis project could touch upon general data- management and governance activities.
A text corpus, consisting of all hearing answers dating back to 2015, will be available to you e.g. for training a classification model. The single hearing consists of 300-400 pages of text and there are around 30 hearing per year.

The ministry is open to other project ideas or research angles within the automation, NLP and ML space. An alternative approach could be to focus on conceptualizing general process automation opportunities using NLP and prioritize them according to data quality/maturity, application simplicity, potential gains in efficiency and other important factors.

Student description

The student(s) should find it exciting to work with NLP, machine learning and big data and see the academic relevance of researching data innovation activities in the public sector. 
The student(s) should also have a rudimentary understanding of data -quality, -management and -governance, as well as proficiency with machine learning- and NLP methodologies and applications. Knowledge of Python or other relevant programming language is not a specific requirement but could be advantageous in the thesis work.
In addition, the student(s) should also have some knowledge of data visualization tools such as Tableau or Power BI. 

The ministry’s Disruption Taskforce will be your main internal supervisor assisting you throughout this project, and you will similarly have correspondences with the respective data owners from the ministry’s department and its agencies.
The Disruption Taskforce will be able to spar with you on technical matters, as well as assist with the data cleaning process. 

Massive text analysis using NLP for key Insights extraction | Match My Thesis
Dec 19, 2019





Text Analysis

Sentiment Analysis