Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language, in particular how to program computers to process and analyze large amounts of natural language data.
In my previous articles, I have addressed some specific topics on NLP like Text Classification, Natural Language Search, etc. Here I want to give a quick introduction to a few key technical capabilities of Natural Language Processing.With recent advances in Artificial intelligence technologies, computers have become very adept at reading, understanding and interpreting human language. Let’s look a few key capabilities of NLP. These are by no means a comprehensive list of all NLP capabilities.
Named Entity Recognition (NER):
NER is one of the first steps towards information extraction from large unstructured data. NER seeks to locate and extract named entities that are present in a text into pre-defined categories like persons, countries, organizations etc. This helps with answering many questions such as:
– How many mentions of an organization is in this article?
– Were there any specific products mentioned in a customer review?
This technology will enable organizations to extract individual entities from documents, social media, knowledge base etc. The better defined and trained the ontologies are, the more efficient the outcome will be.
Topic Modeling is a type of statistical modeling for discovering abstract topics from a large document set. It is frequently used to discover hidden semantic structures in a textual body. It is different from traditional classification in that, it is an unsupervised method of extract main topics. This technique is used in the initial exploring phase to find what the common topics are in the data. Once you discover the topics, you can use language in those topics to create categories. One of the popular methods used for Topic Modeling is Latent Dirichlet Allocation (LDA). LDA builds a topic per document model and words per topic model, modeled as Dirichlet distributions. You can read more about LDA here: http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Text classification (a.k.a text categorization or text tagging) is the task of assigning a set of predefined categories to free-text. This is a supervised training methodology as opposed to Topic Modeling above. I have written in detail about text classification here:
Information Extraction is used to automatically find meaningful information in unstructured text. Information extraction (IE) distills structured data or knowledge from the unstructured text by identifying references to named entities as well as stated relationships between such entities. IE systems can be used to directly extricate abstract knowledge from a text corpus or to extract concrete data from a set of documents which can then be further analyzed with traditional data-mining techniques to discover more general patterns.
Sentiment analysis is the automated process of understanding an opinion about a given subject from written or spoken language. Sentiment analysis decodes the meaning behind human language, allowing organizations to analyze and interpret comments on social media platforms, documents, news articles, websites, and other venues for public comment.
Within government agencies and organizations, there is a deluge of unstructured data both in analog and digital form. NLP can provide the needed tools to move the needle forward in providing better visibility and knowledge into unstructured data. NLP can be utilized in many ways. To name a few: Analyze public data like Social Media, reviews, comments, etc., Get visibility into the organizational knowledge base, provide predictive capabilities, enhance citizen services, etc. There is much to be learned from the potential of AI and, in particular, its ability to analyze masses of unstructured data. It is time now for agencies and organizations to take action to harness the power of NLP to stay ahead.