The Role of Negative Information in Distributional Semantic Learning.

Johns, Brendan T.; Mewhort, Douglas J. K.; Jones, Michael N.

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1215560

Record Type: Journal

Publication Date: 2019-May

Pages: 30

Abstractor: As Provided

ISBN: N/A

ISSN: EISSN-1551-6709

EISSN: N/A

Available Date: N/A

The Role of Negative Information in Distributional Semantic Learning

Johns, Brendan T.; Mewhort, Douglas J. K.; Jones, Michael N.

Cognitive Science, v43 n5 e12730 May 2019

Distributional models of semantics learn word meanings from contextual co-occurrence patterns across a large sample of natural language. Early models, such as LSA and HAL (Landauer & Dumais, 1997; Lund & Burgess, 1996), counted co-occurrence events; later models, such as BEAGLE (Jones & Mewhort, 2007), replaced counting co-occurrences with vector accumulation. All of these models learned from positive information only: Words that occur together within a context become related to each other. A recent class of distributional models, referred to as neural embedding models, are based on a prediction process embedded in the functioning of a neural network: Such models predict words that should surround a target word in a given context (e.g., "word2vec"; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013). An error signal derived from the prediction is used to update each word's representation via backpropagation. However, another key difference in predictive models is their use of negative information in addition to positive information to develop a semantic representation. The models use negative examples to predict words that should not surround a word in a given context. As before, an error signal derived from the prediction prompts an update of the word's representation, a procedure referred to as negative sampling. Standard uses of "word2vec" recommend a greater or equal ratio of negative to positive sampling. The use of negative information in developing a representation of semantic information is often thought to be intimately associated with "word2vec"'s prediction process. We assess the role of negative information in developing a semantic representation and show that its power does not reflect the use of a prediction mechanism. Finally, we show how negative information can be efficiently integrated into classic count-based semantic models using parameter-free analytical transformations.

Descriptors: Semantics, Learning Processes, Models, Prediction, Language Processing, Psycholinguistics, Error Patterns, Cues, Computational Linguistics

Wiley-Blackwell. 350 Main Street, Malden, MA 02148. Tel: 800-835-6770; Tel: 781-388-8598; Fax: 781-388-8232; e-mail: cs-journals@wiley.com; Web site: http://www.wiley.com.bibliotheek.ehb.be/WileyCDA

Publication Type: Journal Articles; Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Author Affiliations: N/A