System For Health Document Classification Using Machine Learning

The System For Health Document Classification Using Machine Learning Complete Project Material (PDF/DOC)

Abstract

Due to the massive increase in medical documents every day (including books, journals, blogs, articles, doctors’ instructions and prescriptions, emails from patients, etc.), it is becoming very challenging to handle and to categorize them manually. One of the most challenging projects in information systems is extracting information from unstructured texts, including medical document classification. The discovery of knowledge from medical datasets is important in order to make effective medical diagnosis. Developing a classification algorithm that classifies a medical document by analyzing its content and categorizing it under predefined topics is the primary aim of this research. In this project work we were able to succeed in applying Natural Language Processing which is a branch of Machine Learning to Classifying Health related documents. We made use of the OpenNLP Application Programming Interface which is a Java API for training a model and classifying the documents. We make use of Materialize which is a HTML5, CSS and JavaScript framework for building the user interface. The software is also built using the Model-View-Controller (MVC) architecture. The algorithm classified the articles correctly under the actual subject headings and got the total subject headings correct. This holds promising solutions for the global health arena to index and classify medical documents expeditiously.

Chapter One

1.0 Introduction

This chapter introduces the topic of the project work A System for Health Document Classification Using Machine Learning. In this chapter, we will consider the background of the study, statement of the problem, aims and objectives, methodology used to design the system, scope of the study, its significance, definition of terms, and we conclude with the project layout or organization of the project work.

1.1 Background of the Study

Contemporarily, most hospitals, medical laboratories and other health facilities make use of some kind of information system. These could be either a hospital management system or a pharmacy management system. Among other functions that these systems provide, they are mainly used in collecting patient records. These information systems stores patient records in digital format. Numerous patient data are being recorded on a daily basis which forms a large data set popularly referred to as “Big Data”.

Every day physicians and other health workers are required to work with this “Big Data” in other to provide solution. Some of the everyday tasks include information retrieval and data mining. Retrieving information from big data can be very laborious and time consuming. This has given rise to the study of text or document classification in other to aid the process of retrieving information from big data. Today, text classification is a necessity due to the very large amount of text documents that we have to deal with daily.

Document classification is the task of grouping documents into categories based upon their content. Document classification is a significant learning problem that is at the core of many information management and retrieval tasks. Document classification performs an essential role in various applications that deals with organizing, classifying, searching and concisely representing a significant amount of information. Document classification is a longstanding problem in information retrieval which has been well studied (Russell, 2018).

Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. Machine learning approaches to classification suggest the automatic construction of classifiers using induction over pre-classified sample documents. In this project work we will employ machine learning in classifying health documents.

1.2 Statement of the Problem

With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. Also in the health sector, numerous patient records are being collected everyday and are used for analysis. How do we efficiently classify or categorize these health documents to complement easy retrieval.

1.3 Aim and Objectives of the Study

The aim of this project is to develop A System for Health Document Classification Using Machine Learning.

Other objectives include:

Study the various machine learning classification algorithm.

Implement classification algorithm in JAVA.

 

1.4 Scope of the Study

As stated earlier, statistical pattern recognition, or neural network are used in classifying documents, this project work will concentrate on using machine learning algorithm to classify document.

1.5 Significance of the Study

The software delivered from this project work will greatly reduce the time used by doctors, physicians and other health workers in searching and retrieving documents.

Other importance of this project work includes:

Helps students and other interested individuals that want to develop a similar application.

It will serve as source of materials for those interested in investigating the processes involved in developing a document classification system using machine learning.

It will serve as source of materials for students who are interested in studying machine learning.

 

1.6 Definition of Terms

Document Classification:

Is the task of grouping documents into categories based upon their content.

Health Document:

A health certificate is written by a doctor and displays the official results of a physical examination.

Machine Learning:

The study and construction of algorithms that can learn from and make predictions on data.

JSP:

Java Server Pages is a java technology for creating dynamic web pages.

HTML:

Hyper Text Markup Language for creating web-pages.

MYSQL:

A database management system for creating, storing and manipulating databases.

SERVLET:

Is a small pluggable extension to a Server that enhances the Server’s functionality.

BOOTSTRAP:

Is a sleek, intuitive, and powerful mobile first front-end framework for faster and easier web development. It uses HTML, CSS and Javascript.

1.7 Organization of Work

Chapter one introduces the background of the project with the statement of the problems, objectives of the project, its significance, scope, and constraints are pointed out.

Chapter two reviews literatures on machine learning, document classification and the review of related literature.

Chapter three discusses system Investigation and Analysis. It deals with detailed investigation and analysis of the existing system and problem identification. It also proposed for the new system.

Chapter four covers the system design and implementation. Chapter five was the summary and conclusion of the project.

Chapter Five

Summary and Conclusion

5.0 Introduction

This chapter summarizes and concludes the project work; it also gives recommendations and insight to future work.

5.1 Summary

In this project work we were able to succeed in applying Natural Language Processing which is a branch of Machine Learning to Classifying Health related documents. We made use of the OpenNLP Application Programming Interface which is a Java API for training a model and classifying the documents. We make use of Materialize which is a HTML5, CSS and JavaScript framework for building the user interface. The software is also built using the Model-View-Controller (MVC) architecture.

5.2 Recommendation

To properly use the system we recommend the following:

The system can be hosted online on a Tomcat server, so that all users can access it from their respective locations (details of this can be found in chapter four).

Medical Personnel should be trained on how to use the system.

The model should be properly trained to ensure accurate classification by the system. a poorly trained model will lead to erroneous classification.

 

5.3 Future Work

Due to the limited time involved in developing this project work, some key features could not be integrated, it is my recommendation that in future work, the following features be added.

A crawler should be implemented such that the model is constantly being updated from the internet.

When there is new data added to the model from the internet, a listener (should be implemented) that triggers the retraining of the algorithm should be notified.

 

5.4 Conclusion

In conclusion we can see that applying Natural Language Processing to classification of text and text based documents is the most effective instead of using other machine learning techniques such as clustering which can be regarded as over kill. Natural language processing has a lot of potential outside document classification; its relevance has been seen in the area of sentiment analysis. It is my recommendation that further research be carried out in the field of Natural Language processing.

How To Download Complete Material (PDF/Doc)

This Research Work On “System For Health Document Classification Using Machine Learning” Complete Material Can Be Downloaded Through Whatsapp, Email Or Download Link. Click The Below Button To Proceed:

Disclamer:

This study on the System For Health Document Classification Using Machine Learning is solely for academic research purposes only and should be used as a research guideline or source of ideas. Copying word-for-word or submitting the entire project work to your school is unethical academic behavior and “UniProjects” is not part of it.