Named Entity Recognizer - ML.Net 3.0

207 Views
Last Post 17 January 2024

Admin

Chris posted this 17 January 2024 - Last edited 17 January 2024

My Friends,

A little off topic today. Sorry, but, for some, this may still be useful.

Artificial Intelligence, in particular NLP, Natural Language Processing, has a subcategory called Named Entity Recognition. This is a very useful tool, and it has many implementations, on many different platforms.

ML.NET 3.0 has implemented a trainer for NER, but the code is incomplete, and many have had a lot of trouble implementing it. I had a bit of a play with this and got it working. There is a good GitHub Issue Thread Here, that gives a bit of an idea on how to progress.

To make this work, you need to install the following packages:

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="libtorch-cpu-win-x64" version="1.13.0.1" targetFramework="net461" />
  <package id="Microsoft.ML" version="3.0.0-preview.23511.1" targetFramework="net461" />
  <package id="TorchSharp" version="0.99.5" targetFramework="net461" />

...
</packages>

We need some helper classes to do some work on the data.

private class InputTrainingData
{

        public string Sentence;
        public string[] Label;
}

We need a Label class:

public class Label
{
    // The Key: Person, Org...
    public string Key { get; set; }
}

We need two classes to infer a sentence:

private class Input
{

        public string Sentence;
        public string[] Label;
}



private class Output
{

        public string[] Predictions;
}

Here is the working class itself:

    #region Using Statements:



    using System;
    using System.Collections.Generic;

    using Microsoft.ML;
    using Microsoft.ML.Data;
    using Microsoft.ML.TorchSharp;



    #endregion



    public class Program
    {



        // Main method
        public static void Main(string[] args)
        {

            try
            {
                var context = new MLContext()
                {
                    FallbackToCpu = true,
                    GpuDeviceId = 0
                };

                var labels = context.Data.LoadFromEnumerable(
                    new[] {

                            // SpaCy Supported Types:
                            // See: https://www.kaggle.com/code/curiousprogrammer/entity-extraction-and-classification-using-spacy/notebook
                            new Label { Key = "PERSON" },       // People, including fictional.
                            new Label { Key = "NORP" },         // Nationalities or religious or political groups.
                            new Label { Key = "FAC" },          // Buildings, airports, highways, bridges, etc.
                            new Label { Key = "ORG" },          // Companies, agencies, institutions, etc.
                            new Label { Key = "GPE" },          // Countries, cities, states.
                            new Label { Key = "LOC" },          // Non-GPE locations, mountain ranges, bodies of water.
                            new Label { Key = "PRODUCT" },      // Objects, vehicles, foods, etc. (Not services.)
                            new Label { Key = "EVENT" },        // Named hurricanes, battles, wars, sports events, etc.
                            new Label { Key = "WORK_OF_ART" },  // Titles of books, songs, etc.
                            new Label { Key = "LAW" },          // Named documents made into laws.
                            new Label { Key = "LANGUAGE" },     // Any named language.
                            new Label { Key = "DATE" },         // Absolute or relative dates or periods.
                            new Label { Key = "TIME" },         // Times smaller than a day.
                            new Label { Key = "PERCENT" },      // Percentage, including "%".
                            new Label { Key = "MONEY" },        // Monetary values, including unit.
                            new Label { Key = "QUANTITY" },     // Measurements, as of weight or distance.
                            new Label { Key = "ORDINAL" },      // "first", "second", etc.
                            new Label { Key = "CARDINAL" },     // Numerals that do not fall under another type.

                            // Added Types by Me:
                            new Label { Key = "OBJECT" },       // An Object, Entity might be a Spoon, or a Soccer Ball. Needs Sub Categories.
                });

                var dataView = context.Data.LoadFromEnumerable(
                    new List<InputTrainingData>(new InputTrainingData[] {
                    new InputTrainingData()
                    {   
                        // Testing longer than 512 words.
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                    },
                     new InputTrainingData()
                     {
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                     },
                    }));

                var chain = new EstimatorChain<ITransformer>();

                var estimator = chain.Append(context.Transforms.Conversion.MapValueToKey("Label", keyData: labels))
                   .Append(context.MulticlassClassification.Trainers.NameEntityRecognition(outputColumnName: "Predictions"))
                   .Append(context.Transforms.Conversion.MapKeyToValue("Predictions"));

                var transformer = estimator.Fit(dataView);

                var transformerSchema = transformer.GetOutputSchema(dataView.Schema);

                string sentence = "Alice and Bob live in the USA";
                var Encoded = Tokenizer.Tokenize(sentence);

                // var trainedModel = context.Model.Load(GetOutputFilePath(), out DataViewSchema _);
                var engine = context.Model.CreatePredictionEngine<Input, Output>(transformer);
                Output predictions = engine.Predict(new Input { Sentence = sentence });

                transformer.Dispose();

                Console.WriteLine("Success!");
                Console.ReadLine();
            }
            catch (Exception ex)
            {

                Console.WriteLine($"Error: {ex.Message}");
                Console.ReadLine();
            }
        }
    }

We need to instantiate the Tokenizer class:

    #region Using Statements:



    using Microsoft.ML.Tokenizers;



    #endregion




    public class Tokenizer
    {


        private static Microsoft.ML.Tokenizers.Tokenizer _instance;
        private static EnglishRoberta Roberta = new EnglishRoberta("Data/encoder.json", "Data/vocab.bpe", "Data/dict.txt");



        /// <summary>
        /// .
        /// </summary>
        public static TokenizerResult Tokenize(string input)
        {

            Roberta.AddMaskSymbol();
            _instance = new Microsoft.ML.Tokenizers.Tokenizer(Roberta, new RobertaPreTokenizer());
            return _instance.Encode(input);
        }
    }

The files: "encoder.json", "vocab.bpe", "dict.txt", you can download via the links provided, and save them in a Data folder. Don't forget to copy to output directory.

The prediction is fairly accurate, with only two training examples, here is the prediction I got:

We should be getting:

new InputTrainingData()
{
   Sentence = "Alice and Bob live in the USA",
   Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
},

At position [6] we should be getting: "COUNTRY". With some more training examples, this will improve drastically!

The EnglishRoberta class, encodes, or tokenizes words like so:

NER is a very useful tool, used in many areas in IT and Data Aquisition! It is useful for automatically extracting information from large texts!

Best Wishes,

Chris

Liked by

Aboveunity.com awesome starts here! - Thank You L0stf0x

We're Light Years Ahead!

We are Light Years ahead of the other Forms!

Our Above Unity Machines:

Asymmetrical EMI

Wistiti's Bucking Joule Thief

Non-Inductive Coil Experiment

CD's Non-Inductive Coil Rep

Baerndorfer's Experiment

Captainloz's A-R-E

Aetherholic's COP>4.0

Resonant EMI

Yo's Ferro Resonance

Jagau's Non-linear Resonance

Ferrite at work

Recomended Protocol:

Measurement

Switching System

Measurement Block

Current Observation

Current Sensing Resolution

Members Online:

No one online at the moment

What is a Scalar:

In physics, scalars are physical quantities that are unaffected by changes to a vector space basis. Scalars are often accompanied by units of measurement, as in "10 cm". Examples of scalar quantities are mass, distance, charge, volume, time, speed, and the magnitude of physical vectors in general.

You need to forget the Non-Sense that some spout with out knowing the actual Definition of the word Scalar! Some people talk absolute Bull Sh*t!

The pressure P in the formula P = pgh, pgh is a scalar that tells you the amount of this squashing force per unit area in a fluid.

A Scalar, having both direction and magnitude, can be anything! The Magnetic Field, a Charge moving, yet some Numb Nuts think it means Magic Science!

Start Here:

Help with using the Forum.

Message from God:

Hello my children. This is Yahweh, the one true Lord. You have found creation's secret. Now share it peacefully with the world.
Ref: Message from God written inside the Human Genome

God be in my head, and in my thinking.

God be in my eyes, and in my looking.

God be in my mouth, and in my speaking.

Oh, God be in my heart, and in my understanding.

Your Support:

More than anything else, your contributions to this forum are most important! We are trying to actively get all visitors involved, but we do only have a few main contributors, which are very much appreciated! If you would like to see more pages with more detailed experiments and answers, perhaps a contribution of another type maybe possible:

PayPal De-Platformed me!

They REFUSE to tell me why!

We now use Wise!

Donate Use E-Mail: Chris at aboveunity.com

The content I am sharing is not only unique, but is changing the world as we know it! Please Support Us!

Thank You So Much!

Browse by Category:

Weeks High Earners:

The great Nikola Tesla:

Ere many generations pass, our machinery will be driven by a power obtainable at any point of the universe. This idea is not novel. Men have been led to it long ago by instinct or reason. It has been expressed in many ways, and in many places, in the history of old and new. We find it in the delightful myth of Antheus, who drives power from the earth; we find it among the subtle speculations of one of your splendid mathematicians, and in many hints and statements of thinkers of the present time. Throughout space there is energy. Is this energy static or kinetic? If static, our hopes are in vain; if kinetic - and this we know it is for certain - then it is a mere question of time when men will succeed in attaching their machinery to the very wheelwork of nature.

Experiments With Alternate Currents Of High Potential And High Frequency (February 1892).