System For Health Document Classification Using Machine Learning

The System For Health Document Classification Using Machine Learning Project Material

Abstract

Due to the massive increase in medical documents every day (including books, journals, blogs, articles, doctors’ instructions and prescriptions, emails from patients, etc.), it is becoming very challenging to handle and to categorize them manually. One of the most challenging projects in information systems is extracting information from unstructured texts, including medical document classification. The discovery of knowledge from medical datasets is important in order to make effective medical diagnosis. Developing a classification algorithm that classifies a medical document by analyzing its content and categorizing it under predefined topics is the primary aim of this research. In this project work we were able to succeed in applying Natural Language Processing which is a branch of Machine Learning to Classifying Health related documents. We made use of the OpenNLP Application Programming Interface which is a Java API for training a model and classifying the documents. We make use of Materialize which is a HTML5, CSS and JavaScript framework for building the user interface. The software is also built using the Model-View-Controller (MVC) architecture. The algorithm classified the articles correctly under the actual subject headings and got the total subject headings correct. This holds promising solutions for the global health arena to index and classify medical documents expeditiously.

Chapter One

1.0 Introduction

This chapter introduces the topic of the project work A System for Health Document Classification Using Machine Learning. In this chapter, we will consider the background of the study, statement of the problem, aims and objectives, methodology used to design the system, scope of the study, its significance, definition of terms, and we conclude with the project layout or organization of the project work.

1.1 Background of the Study

Contemporarily, most hospitals, medical laboratories and other health facilities make use of some kind of information system. These could be either a hospital management system or a pharmacy management system. Among other functions that these systems provide, they are mainly used in collecting patient records. These information systems stores patient records in digital format. Numerous patient data are being recorded on a daily basis which forms a large data set popularly referred to as “Big Data”.

Every day physicians and other health workers are required to work with this “Big Data” in other to provide solution. Some of the everyday tasks include information retrieval and data mining. Retrieving information from big data can be very laborious and time consuming. This has given rise to the study of text or document classification in other to aid the process of retrieving information from big data. Today, text classification is a necessity due to the very large amount of text documents that we have to deal with daily.

Document classification is the task of grouping documents into categories based upon their content. Document classification is a significant learning problem that is at the core of many information management and retrieval tasks. Document classification performs an essential role in various applications that deals with organizing, classifying, searching and concisely representing a significant amount of information. Document classification is a longstanding problem in information retrieval which has been well studied (Russell, 2018).

Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. Machine learning approaches to classification suggest the automatic construction of classifiers using induction over pre-classified sample documents. In this project work we will employ machine learning in classifying health documents.

1.2 Statement of the Problem

With the explosion of information fuelled by the growth of the World Wide Web it is no longer feasible for a human observer to understand all the data coming in or even classify it into categories. Also in the health sector, numerous patient records are being collected everyday and are used for analysis. How do we efficiently classify or categorize these health documents to complement easy retrieval.

1.3 Aim and Objectives of the Study

The aim of this project is to develop A System for Health Document Classification Using Machine Learning.

Other objectives include:

Study the various machine learning classification algorithm.

Implement classification algorithm in JAVA.

1.4 Scope of the Study

As stated earlier, statistical pattern recognition, or neural network are used in classifying documents, this project work will concentrate on using machine learning algorithm to classify document.

1.5 Significance of the Study

The software delivered from this project work will greatly reduce the time used by doctors, physicians and other health workers in searching and retrieving documents.

Other importance of this project work includes:

Helps students and other interested individuals that want to develop a similar application.

It will serve as source of materials for those interested in investigating the processes involved in developing a document classification system using machine learning.

It will serve as source of materials for students who are interested in studying machine learning.

1.6 Definition of Terms

Document Classification:

Is the task of grouping documents into categories based upon their content.

Health Document:

A health certificate is written by a doctor and displays the official results of a physical examination.

Machine Learning:

The study and construction of algorithms that can learn from and make predictions on data.

JSP:

Java Server Pages is a java technology for creating dynamic web pages.

HTML:

Hyper Text Markup Language for creating web-pages.

MYSQL:

A database management system for creating, storing and manipulating databases.

SERVLET:

Is a small pluggable extension to a Server that enhances the Server’s functionality.

BOOTSTRAP:

Is a sleek, intuitive, and powerful mobile first front-end framework for faster and easier web development. It uses HTML, CSS and Javascript.

1.7 Organization of Work

Chapter one introduces the background of the project with the statement of the problems, objectives of the project, its significance, scope, and constraints are pointed out.

Chapter two reviews literatures on machine learning, document classification and the review of related literature.

Chapter three discusses system Investigation and Analysis. It deals with detailed investigation and analysis of the existing system and problem identification. It also proposed for the new system.

Chapter four covers the system design and implementation. Chapter five was the summary and conclusion of the project.

Computer Engineering (1138), Computer Science (1251)

Chapter Two: Literature Review

In this chapter, System For Health Document Classification Using Machine Learning is critically examined through a review of relevant literature that helps explain the research problem and acknowledges the contribution of scholars who had previously contributed immensely to similar research. The chapter intends to deepen the understanding of the study and close the perceived gaps …

System For Health Document Classification Using Machine Learning

The System For Health Document Classification Using Machine Learning Project Material

Abstract

Chapter One

Chapter Two: Literature Review

SIMILAR PROJECT TOPICS

Related Project Topic