“A natural language utterance is rich in structures of different kinds: sounds form recurring patterns and morphemes, words form phrases, grammatical functions emerge from morphological and phrasal structure, and patterns of phrases evoke a complex meaning.”
Thoughtco.com – Mary Dalrymple, John Lamping, Fernando Pereira, and Vijay Saraswat
The field of Natural Language Processing, or NLP, attempts to solve the problem rooted in the fact that vocabularies are diverse and that pragmatics (i.e. how context contributes to meaning) and syntax (the arrangement of words) often aid us in arriving at a clear understanding of a given proposition.
In NLP, the goal is to start with some written text (which paradoxically in the field is considered unstructured) and then arrive at something structured that a computer can understand. We would think that it is the computer, to begin with, that should be tasked with the conundrum of turning some unstructured text into something structured! But that’s not the case. Language is vast and complex, and that scope requires the intervention of humans to structure the contents.
I believe that we can achieve this science and/or discipline within our own written work. When we are circumspect to form grammatically correct sentences and aware of linguistic structures such as coordination, our readers will truly appreciate the result. To this end, we should aim to structure our “unstructured” text in such a way that it communicates exactly what we intended and that the result is trenchant, and has logical rigour. Therein lies our premise that applying the principles of NLP can help writers produce better work.
Martin Keen, Master Inventor at IBM, says that he has “utilized NLP in a good number of my invention disclosures.” He further went on to say, “The thing with NLP is [that] it’s not like one algorithm; it’s actually more like a bag of tools, and you can apply these bag[s] of tools to be able to resolve some of these use cases [i.e. machine translation, virtual assistants such as ChatBot, sentiment analysis, and spam detection].”
Similar to Keen’s analogy, our editing model comprises containers of tools. We aim to apply what we have placed in these containers to resolve any number of issues. Visit our web page “The case for building our editing model” or download our handout.
For example, in the Copy Editing and Manuscript Construction core areas of our pedagogy, we apply the principles of NLP to break down text strings (sentence elements) into grammatical constituents (chunks of sentence elements that relate to one another). That’s called Tokenization in the field.
Then we make sure that we understand the root of words to enable exact and/or correct usage, particularly when it comes to verbs and adjectives. For example, it is important to note that the word stem is run for each of the following words: running, runs, and ran. That’s called Stemming in NLP.
We have mentioned these esoteric terms to illustrate the topography of our editing model and what we are emulating. Yet our focus is on our human ability to learn and educate ourselves. The field of NLP applies this principle in terms of generative AI, for example, through the learning of the meaning of a word through a dictionary definition (that’s Lemmatization) and looking where a word is used within the context of a sentence (and that’s Part of Speech Tagging). However, our model encourages users to perform the same steps with an intent for their own learning education, a hallmark of problem-based learning.
Martin Keen highlighted Tokenization, Stemming, Lemmatization, and Part of Speech Tagging as “some of the tools that we can apply from this big bag of tools from NLP in order to get from unstructured human speech through to something structured that a computer can understand. And once we’ve done that, we can apply that structured data to all sorts of AI applications.” The idea that we can “get from unstructured human speech through to something structured that a computer can understand” is enough inspiration, I believe, that writers can courageously and resiliently apply the same effort to deconstruct the language to perfect their own work.
I believe that if you approach the tools that we’ve created from a problem-based learning angle, wherein you are encouraged to “[t]ake a little bit of creativity, add a dash of innovation, and sprinkle in some critical thinking,” you can get from the draft to the polished version of content that not only will be trenchant, but also will have logical rigour.
We use The SAP to showcase a process of grammatical division, and identification of syntactic functions to help you understand the exact meaning of a word or phrase, or a sentence. The SAP focuses on the analysis of sentences/clauses drawn from a variety of sources, which will be used to highlight grammatical rules as well as some linguistic theories and phenomenons. This analysis will be conducted by parsing (dividing) sentences and determining (identifying) their syntactic functions. The goal is to understand the exact meaning of a word or phrase, or a sentence/clause.
What do we mean by “parsing”?
Parsing is the –ing verbal of the infinitive (untense or uninflected form of a verb) to parse. This process of grammatical division and identification of syntactic function will highlight grammatical form and grammatical function.
The framework of SSORE addresses some of the recurring patterns and different kinds of structures in manuscripts.
The following describes how SSORE is incorporated into our copy-editing process and overall model:
Our pedagogy is reflected in a cadre of tools that we developed as part of our process-method. The tools are: