NLPwShiyi - Natural Language Processing with Shiyi

Noisy Channel Model

For the transmission of language through media like communication channels and speech acoustics, Shannon used the metaphors of the noisy channel and decoding.

Shannon produced the first measurement of the entropy of English using probabilistic approaches, employing entropy from thermodynamics to quantify the information content of a language or the information capacity of a channel.

Instrumental phonetics and the development of the sound spectrograph established the foundation for later work in voice recognition (Koenig et al., 1946). The first automated speech recognition systems were created in the early 1950s.

Foundational Insights: 1940s - 1950s

Using Shannon's work as inspiration, Chomsky (1956-7) originally studied finite-state machines as a way to characterize a grammar before defining a finite-state language as a language produced by a finite-state grammar.

These pioneering theories paved the way for formal language theory, which defines formal languages as sequences of symbols using algebra and set theory. Meanwhile, in the realm of theoretical linguistics, Chomsky (1957) postulated transformational and context free grammar which helped establish paradigms for describing natural language syntax.

The work of Turing, the founding father of computer science, first resulted in the development of a computing component that was more close to propositional logic. After that, regular expressions and finite state automata were created by Stephen Kleene, inspired by the work of Emil Post in 1951 and 1956.

Then, probabilistic models, such as Markov processes, have indeed played a significant role in automating and formalizing natural language processing. They provide a probabilistic framework for understanding language structure and have been widely used in various language processing tasks.

The Merging of Two Cultures

Language Theory: The work of Chomsky and others on formal language theory and generative syntax throughout the late 1950s and early to mid 1960s, and work of many linguists and computational scientists on parsing algorithms, initially top-down and bottom-up and via dynamic programming.

The Transformations and Discourse Analysis Project (TDAP), developed by Zelig Harris and deployed at the University of Pennsylvania between 1958 and 1959, was one of the first full parsing systems (Harris, 1962).

Artificial Intelligence: In the summer of 1956, researchers were called together for a two-month workshop on what John McCarthy, Marvin Minsky, Cloude Shannon, and Nathaniel Rochester came to refer to as artificial intelligence (AI).

These were simple systems that worked in single domains mainly by a combinations of pattern matching and keyboard search with simple heuristics for reasoning and question-answering. By the late 60s more formal logical systems were developed.

Paradigms Develop

The Bayesian approach was starting to be used to tackle the optical character recognition issues.

By multiplying the likelihoods for each letter, Bledsoe and Browning (1959) created a Bayesian text-recognition system that made use of a large dictionary to compute the likelihood of each observed letter sequence given each word in the dictionary. Bayesian techniques were used by Mosteller and Wallace (1964) to address the issue of authorship attribution on the Federalist papers.

The first online corpus was created in 1963–1964 at Brown University and contains 1 million words of samples from 500 different written texts in a variety of genres (newspapers, novels, non-fiction, academic texts, etc.).

Independently, employees at IBM and Baker at Carnegie Mellon University developed speech recognition algorithms, using techniques like the Hidden Markov Model and analogies to a noisy channel and decoding.

Empiricism Redux

After work on finite-state phonology and morphology by Kaplan and Kay (1981) and finite-state models of syntax by Church (1980), finite-state models started to garner attention once more.

Empiricism is making a comeback, most notably with the rise of probabilistic models used in speech and language processing, which has been greatly influenced by research at the IBM Thomas J. Watson Research Center on speech recognition probabilistic models.

These probabilistic techniques are additional examples of data-driven strategies used in part-of-speech tagging, attachment ambiguities, parsing, and connectionist strategies ranging from speech recognition to semantics. These were the original approaches used before the sophisticated neural language models that are used today.

Understanding Natural Language Processing with Me!

The Subject Matter

A Little Bit Of History

Noisy Channel Model

Foundational Insights: 1940s - 1950s

The Merging of Two Cultures

Paradigms Develop

Empiricism Redux