☰
Python Across Disciplines
with Python + AI Tool   
×
Table of Contents

1.1.   Introduction 1.2.   About the Author & Contact Info 1.3.   Book Conventions 1.4.   What (Who) is a Programmer? 1.5.   Programming Across Disciplines 1.6.   Foundational Computing Concepts 1.7.   About Python 1.8.   First Steps 1.8.1 Computer Setup 1.8.2 Python print() Function 1.8.3 Comments
2.1. About Data 2.2. Data Types 2.3. Variables 2.4. User Input 2.5. Data Structures (DS)         2.5.1. DS Concepts         2.5.2. Lists         2.5.3. Dictionaries         2.5.4. Others 2.6. Files         2.6.1. Files & File Systems         2.6.2. Python File Object         2.6.3. Data Files 2.7. Databases
3.1. About Processing 3.2. Decisions         3.2.1 Decision Concepts         3.2.2 Conditions & Booleans         3.2.3 if Statements         3.2.4 if-else Statements         3.2.5 if-elif-else Statements         3.2.6 In-Line if Statements 3.3. Repetition (a.k.a. Loops)         3.3.1  Repetition Concepts         3.3.2  while Loops         3.3.3  for Loops         3.3.4  Nested Loops         3.3.5  Validating User Input 3.4. Functions         3.4.1  Function Concepts         3.4.2  Built-In Functions         3.4.3  Programmer Defined Functions 3.5. Libraries         3.5.1  Library Concepts         3.5.2  Standard Library         3.5.3  External Libraries 3.6. Processing Case Studies         3.6.1  Case Studies         3.6.2  Parsing Data
4.1. About Output 4.2. Advanced Printing 4.3. Data Visualization   4.4  Sound
  4.5  Graphics
  4.6  Video
  4.7  Web Output
  4.8  PDFs & Documents
  4.9  Dashboards
  4.10  Animation & Games
  4.11  Text to Speech

5.1 About Disciplines 5.2 Accounting 5.3 Architecture 5.4 Art 5.5 Artificial Intelligence (AI) 5.6 Autonomous Vehicles 5.7 Bioinformatics 5.8 Biology 5.9 Bitcoin 5.10 Blockchain 5.11 Business 5.12 Business Analytics 5.13 Chemistry 5.14 Communication 5.15 Computational Photography 5.16 Computer Science 5.17 Creative Writing 5.18 Cryptocurrency 5.19 Cultural Studies 5.20 Data Analytics 5.21 Data Engineering 5.22 Data Science 5.23 Data Visualization 5.24 Drone Piloting 5.25 Economics 5.26 Education 5.27 Engineering 5.28 English 5.29 Entrepreneurship 5.30 Environmental Studies 5.31 Exercise Science 5.32 Film 5.33 Finance 5.34 Gaming 5.35 Gender Studies 5.36 Genetics 5.37 Geography 5.38 Geology 5.39 Geospatial Analysis ☯ 5.40 History 5.41 Humanities 5.42 Information Systems 5.43 Languages 5.44 Law 5.45 Linguistics 5.46 Literature 5.47 Machine Learning 5.48 Management 5.49 Marketing 5.50 Mathematics 5.51 Medicine 5.52 Military 5.53 Model Railroading 5.54 Music 5.55 Natural Language Processing (NLP) 5.56 Network Analysis 5.57 Neural Networks 5.58 Neurology 5.59 Nursing 5.60 Pharmacology 5.61 Philosophy 5.62 Physiology 5.63 Politics 5.64 Psychiatry 5.65 Psychology 5.66 Real Estate 5.67 Recreation 5.68 Remote Control (RC) Vehicles 5.69 Rhetoric 5.70 Science 5.71 Sociology 5.72 Sports 5.73 Stock Trading 5.74 Text Mining 5.75 Weather 5.76 Writing
6.1. Databases         6.1.1 Overview of Databases         6.1.2 SQLite Databases         6.1.3 Querying a SQLite Database         6.1.4 CRUD Operations with SQLite         6.1.5 Connecting to Other Databases
Built-In Functions Conceptss Data Types Date & Time Format Codes Dictionary Methods Escape Sequences File Access Modes File Object Methods Python Keywords List Methods Operators Set Methods String Methods Tuple Methods Glossary Index Appendices   Software Install & Setup
  Coding Tools:
  A.  Python    B.  Google CoLaboratory    C.  Visual Studio Code    D.  PyCharm IDE    E.  Git    F.  GitHub 
  Database Tools:
  G.  SQLite Database    H.  MySQL Database 


Python Across Disciplines
by John Gordon © 2023

Table of Contents

Table of Contents  »  Chapter 3 : Processing : Case Studies : Named Entity Recognition (NER)

Named Entity Recognition (NER)

Contents

Overview

Entity extraction, also known as Named Entity Recognition (NER), is a set of techniques we can use in Python to locate and classify named entities mentioned in unstructured text (phrases, sentences, paragraphs, articles, documents, etc.) into pre-defined categories. These categories can include the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Entity extraction is an important part of processing that helps in understanding text and extracting relevant information.

Concept: Natural Language Processing (NLP)
Full Concepts List: Alphabetical  or By Chapter 

Natural Language Processing (NLP) is a fascinating field at the intersection of computer science, artificial intelligence, and linguistics, aimed at enabling computers to understand, interpret, and generate human language in a meaningful way. In Python, NLP involves using libraries and tools such as spaCy  and others to process and analyze large amounts of text data. This can include tasks like sentiment analysis, language translation, named entity recognition, and chatbot development. For beginners in Python, exploring NLP means learning how to use these libraries to extract insights and patterns from text, automate tasks that involve natural language data, and build applications that can interact with users in more natural and intuitive ways. Through NLP, Python programmers can bridge the gap between human communication and digital data processing, unlocking a wide array of possibilities in data analysis, web development, and artificial intelligence applications.


Concept: Named Entity Extraction (NER)
Full Concepts List: Alphabetical  or By Chapter 

Named Entity Recognition (NER) is a key component of Natural Language Processing (NLP) that involves identifying and classifying key information (entities) in text into predefined categories such as the names of people, organizations, locations, dates, and other specific data. For beginners in Python, learning NER means exploring how to automatically scan entire articles or documents and highlight important information, simplifying data extraction for analysis or automating data entry processes. Python libraries like spaCy  and others provide easy-to-use tools for implementing NER, allowing you to quickly start experimenting with text analysis. Through NER, you can build applications that intelligently process and understand large volumes of text, making it a valuable skill for projects ranging from automated content tagging to enhancing search algorithms and creating more engaging user experiences with personalized content recommendations based on extracted entities.


Concept: Named Entities
Full Concepts List: Alphabetical  or By Chapter 

Named entities are specific pieces of information that are recognized and categorized within a text based on predefined categories such as names of people, places, organizations, dates, and monetary values, among others. In the realm of Natural Language Processing (NLP) with Python, extracting these named entities from text involves using libraries such as spaCy , which can identify and classify these pieces of information automatically. This process, known as Named Entity Recognition (NER), is a fundamental step in understanding and extracting meaning from natural language data, enabling applications like content classification, information retrieval, and data analysis to be more efficient and insightful. For beginners diving into Python-based NLP, mastering named entity extraction is a crucial skill that opens up numerous possibilities for analyzing and interpreting vast amounts of textual data.

The spaCy Library

spaCy  is one of the most popular Python libraries for natural language processing (NLP). It is designed for production use and offers fast performance for NLP tasks. It is well-suited for large-scale information extraction tasks. spaCy provides pre-trained models for multiple languages and supports tasks like tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and text classification. It emphasizes efficiency and accuracy. spaCy's API is streamlined and intuitive, making it accessible for users who are new to NLP while still powerful for advanced users.

Google CoLab

If you are using Google CoLab, the spaCy library is already installed and available for use in any Notebook, so you can go straight to the code examples below.

IDEs like Visual Studio Code, PyCharm, or Others

If you are using an IDE like Visual Studio Code, PyCharm, or others, you'll need to install the spaCy library before you can use it. The common approach to install a library is to use the pip package manager in the terminal. Open a terminal and enter the following two commands:

# First use pip to install the library
pip install spacy
# Then install a language model. You can choose either the small model or large model, like this:
# Use the following if you want the small language model ...
python -m spacy download en_core_web_sm
# ... or the following if you want the large language model ...
python -m spacy download en_core_web_lg

Official Documentation

For detailed documentation on spaCy, see the spaCy usage  page.

Once you have spaCy and a language model installed, you can proceed using spaCy in your code. See the following section for some examples.

spaCy Examples

File Download

Code

import spacy
from spacy import displacy
import html

def perform_ner_and_visualize(file_path):
    nlp = spacy.load("en_core_web_sm")
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    doc = nlp(text)
    displacy_image = displacy.render(doc, style='ent', page=True, minify=True)

def visualize_sentence_dependencies(file_path):
    nlp = spacy.load("en_core_web_sm")
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    doc = nlp(text)
    first_sentence = next(doc.sents)
    displacy.render(first_sentence, style='dep', jupyter=True, options={'distance': 100})

def visualize_all_sentence_dependencies(file_path):
    nlp = spacy.load("en_core_web_sm")
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
    doc = nlp(text)
    for sentence in doc.sents:
        displacy.render(sentence, style='dep', jupyter=True, options={'distance': 100})
        print("\n" + "-"*80 + "\n")


if __name__ == "__main__":
  file_path = "twocities.txt"
  perform_ner_and_visualize(file_path)
  # visualize_sentence_dependencies(file_path)
  visualize_all_sentence_dependencies(file_path)

Output



Figure 1: Result of Named Entity Recognition (NER) on the twocities.txt file




Figure 2: Result of Visualizing Sentence Dependency on the Title of the Book




Figure 3: Result of Visualizing Sentence Dependency on the First Sentence




Figure 4: Result of Visualizing Sentence Dependency on the Second Sentence




Figure 5: Result of Visualizing Sentence Dependency on the Third Sentence

Code Details





 





© 2023 John Gordon
Cascade Street Publishing, LLC