- ideas existed at least since 1700s, but not enough compute and computer science
- Current computers do almost what was predicted, but how?
- How to instruct computer to perform tasks?
- How represent knowledge in computers?
- How to generate the answers?
by his contrivance, the most ignorant person, at a reasonable charge, and with a little bodily labour, might write books in philosophy, poetry, politics, laws, mathematics, and theology, without the least assistance from genius or study. ... to read the several lines softly, as they appeared upon the frame (Gulliver's Travels, by Jonathan Swift 1726, making fun of Ramon Llull 1232)
Prompt as an Interface
- 2001: A Space Odyssey HAL 9000
- input textual instructions e.g. explain a riddle
- based on its knowledge computer generates the answer text
How To Represent Knowledge
- library ~> textual documents in a database
- search by list of words (query) ~1970s, find topics ~1980
- counting word occurrences on documents level into sparce matrices
- methods: TF*IDF, Latent semantic analysis
Non-Contextual Words Vectors
- document -> sentence or small running window of 10 words
- vector is point in a multidimensional space - an array of numbers
- each of 10k words gets one general vector in 300 dimensional space
- each vector is compressed in only 300 dimensions - much less than 10k words
- global (non) contextual word vectors - no disambiguation (flowering) vs fruit (food)
Word2vec: Word To a Global Vector
- count co-occurrence in a 10 word window GloVe (Pennington 2014)
- word2vec (Mikolov 2013): 10 surrounding words sum close to the middle word vector
- words appearing in similar context are close in the 300 dimensional space
- disambiguation - word strings should be just name not an id!
Knowledge Graph’s Nodes Are Disambiguated
- knowledge graph e.g. Wikidata: each node is specific fruit (flowering) vs fruit (food)
- imperfect tradeoff between database and training data samples
- Wikipedia and internet is between knowledge graph and set of documents
- random walk ~ valid “sentences”, link prediction ~ generating text
Transformer: Contextual Word Vectors
- word meaning based on context of 100s of words.
- recurrent neural networks (LSTM, GRU) - sequential with memory
- transformer architecture (Vaswani 2017)
- calculates on entire input sequence
Big Transformer Models
- generate by predicting input text continuation
- $10M transformers trained on large amount of text from the internet in 2022
- can solve wide variety of problems like explaining jokes, sometimes with human level performance
- examples: PaLM (2022), RETRO (2021), GPT-3, …