Incipient diffusion of lexical innovations

General information

Funding: German Research Foundation (DFG), grant number SCHM 1232/5-1

Project start: 1 January 2016

Principal investigator: Prof Dr Hans-Jörg Schmid


Student assistants: N.N.

Project partners: 

Project summary

Modern linguistics has so far failed to answer the question as to which factors determine the degree to which lexical innovations (neologisms) are adopted by the members of a speech community, begin to spread and are established in the lexicon of a language. This question will be addressed by the present project with reference to English. In the existing research on the diffusion of lexical innovations a wide range of language-internal and language-external factors have been claimed to have an effect on the diffusion of neologisms: the regularity of the formation of neologisms; the transparency of their structures; the competition with existing synonyms; their utility for naming new objects and states of affairs; the prestige of the coiner of a word and its early users; the salience of the word in mass media communication; the density of social networks from which words emerge. Presently, these factors have the status of hypotheses which have not been tested properly. So far there is no systematic research on the number and nature of the factors that can play a role, or on the degrees to which they can foster or hinder the diffusion of lexical innovations, let alone on the ways in which they interact and influence each other. The most important reason for this is that the methodological tools for carrying out such research have not been available so far.

The proposed project builds on substantial methodological and theoretical advances made by the participating investigators. It aims to fill the existing gap by collecting large amounts of data on the use and spread of very recent neologisms on the Internet. A tailor-made webcrawler, the so-called NeoCrawler, has been developed which semi-automatically identifies neologisms on the Internet and stores them in a database. It then identifies all new web pages that contain these neologisms at monthly intervals and stores them. All attestations of the neologisms on these pages are extracted with their immediate context and prepared for further linguistic analysis. The data are stored in the database, coded with respect to the factors mentioned above and analyzed with the help of statistical methods that examine the effect sizes of the factors and their interactions. In addition, questionnaire studies will be carried out in the UK and the USA in the first and third years of the project period in order to complement the Internet data with an independent dataset indicating the familiarity of speakers with the neologisms and their assessment of the utility of the new words. These data will also be fed into the statistical models.

The overarching aim of the project is to create empirically sound models which produce realistic predictions concerning the effect sizes of the factors determining the success or failure of neologisms. In addition, the project aims to contribute to the development of empirical methods in linguistics and to linguistic theorizing in the study of linguistic diffusion, variation and change.

The NeoCrawler

Web Interface:

The NeoCrawler has been developed by the project group to discover and observe English neologisms on the World Wide Web (Kerremans et al. 2012). It's Discoverer module searches online sources for potential neologism and the Observer module then performs weekly searches to investigate their diffusion on the web using the Google Custom Search API ( The NeoCrawler is being updated and extended in the course of this project.

Kerremans, D., Stegmayr, S. and Schmid, H.-J. (2012). The NeoCrawler: Identifying and retrieving neologisms from the internet and monitoring ongoing change. In K. Allan & J. Robinson (Eds.), Current methods in historical semantics (pp. 59–96). Berlin: Mouton de Gruyter.

Kerremans, Daphné (2015). A Web of New Words. A Corpus-Based Study of the Conventionalization Process of English Neologisms. Bd. 15. English corpus linguistics. Frankfurt a. M.: Lang.

Project workshop

Title: 'The Dynamics of Lexical Innovation. Data, Methods, Models'
Location: Seidlvilla, Nikolaiplatz 1B, 80802 Munich
Date: 28–30 June, 2017
Workshop website

Conference Contributions

Prokic, J., Kerremans, D., Würschinger, Q. and Schmid, H-J. (2017, June). The NeoCrawler: When frugalista meets bankster. Workshop 'The Dynamics of Lexical Innovation. Data, Methods, Models'. LMU Munich, Munich.

Würschinger, Q., Kerremans, D., Prokic, J. and Schmid, H.-J. (2017, March). NeoCrawler: Erkennen und Beobachten lexikalischer Innovationen im Web. Tagug 'Wortschätze: Dynamik, Muster, Komplexität'. 53. Jahrestagung des Instituts für Deutsche Sprache (IDS). Institut für Deutsche Sprache (IDS), Mannheim. [PDF]

Kerremans, D. and Würschinger, Q. (2016, November). Modelling and researching the dynamic lexicon. A unified sociocognitive perspective and a web-based methodology. Workshop 'Expanding the lexicon -- Linguistic Innovation, Morphological Productivity, and the Role of Discourse-Related Factors'. Trier University, Trier.

Würschinger, Q., Elahi, F., Kerremans, D., and Schmid, H.-J. (2016, October). Do lexical innovations spread as predicted by the S-curve? 7th International Conference of the German Cognitive Linguistics Association (DGKL/GCLA). University of Duisburg-Essen, Essen.

Elahi, F., Würschinger, Q., Kerremans, D., Zhekova, D., and Schmid, H.-J. (2016, September). Recent advances in the web-based investigation of lexical innovations. 4th Conference of the International Society for the Linguistics of English (ISLE). Adam Mickiewicz University, Poznan.

Würschinger, Q., Elahi, F., Zhekova, D. and Schmid, H.-J. (2016, August). Using the web and social media as corpora for monitoring the spread of neologisms. The case of ‘rapefugee’, ‘rapeugee’, and ‘rapugee’. 10th Web as Corpus Workshop (WAC-X), Annual meeting of the Association for Computational Linguistics (ACL). Humboldt University, Berlin. [PDF]

Verantwortlich für den Inhalt: Hans-Jörg Schmid