Linguistic fundamentals for natural language processing [electronic resource] : 100 essentials from morphology and syntax / Emily M. Bender.

Bender, Emily M., 1973-
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2013.
1 electronic text (xvii, 166 p.) : ill., digital file
Synthesis digital library of engineering and computer science.
Synthesis lectures on human language technologies ; 1947-4059 # 20.
Synthesis lectures on human language technologies, 1947-4059 ; # 20

Natural language processing (Computer science).
Computational linguistics.
Grammar, Comparative and general -- Morphology.
System Details:
Mode of access: World Wide Web.
Many NLP tasks have at their core a subtask of extracting the dependencies--who did what to whom--from natural language sentences. This task can be understood as the inverse of the problem solved in different ways by diverse human languages, namely, how to indicate the relationship between different parts of a sentence. Understanding how languages solve the problem can be extremely useful in both feature design and error analysis in the application of machine learning to NLP. Likewise, understanding cross-linguistic variation can be important for the design of MT systems and other multilingual applications. The purpose of this book is to present in a succinct and accessible fashion information about the morphological and syntactic structure of human languages that can be useful in creating more linguistically sophisticated, more language-independent, and thus more successful NLP systems.
1. Introduction/motivation
#0 Knowing about linguistic structure is important for feature design and error analysis in NLP
#1 Morphosyntax is the difference between a sentence and a bag of words
#2 The morphosyntax of a language is the constraints that it places on how words can be combined both in form and in the resulting meaning
#3 Languages use morphology and syntax to indicate who did what to whom, and make use of a range of strategies to do so
#4 Languages can be classified 'genetically', areally, or typologically
#5 There are approximately 7,000 known living languages distributed across language families
#6 Incorporating information about linguistic structure and variation can make for more cross-linguistically portable NLP systems

2. Morphology: introduction
#7 Morphemes are the smallest meaningful units of language, usually consisting of a sequence of phones paired with concrete meaning
#8 The phones making up a morpheme don't have to be contiguous
#9 The form of a morpheme doesn't have to consist of phones
#10 The form of a morpheme can be null
#11 Root morphemes convey core lexical meaning
#12 Derivational affixes can change lexical meaning
#13 Root+derivational affix combinations can have idiosyncratic meanings
#14 Inflectional affixes add syntactically or semantically relevant features
#15 Morphemes can be ambiguous and/or underspecified in their meaning
#16 The notion 'word' can be contentious in many languages
#17 Constraints on order operate differently between words than they do between morphemes
#18 The distinction between words and morphemes is blurred by processes of language change
#19 A clitic is a linguistic element which is syntactically independent but phonologically dependent
#20 Languages vary in how many morphemes they have per word (on average and maximally)
#21 Languages vary in whether they are primarily prefixing or suffixing in their morphology
#22 Languages vary in how easy it is to find the boundaries between morphemes within a word

3. Morphophonology
#23 The morphophonology of a language describes the way in which surface forms are related to underlying, abstract sequences of morphemes
#24 The form of a morpheme (root or affix) can be sensitive to its phonological context
#25 The form of a morpheme (root or affix) can be sensitive to its morphological context
#26 Suppletive forms replace a stem+affix combination with a wholly different word
#27 Alphabetic and syllabic writing systems tend to reflect some but not all phonological processes

4. Morphosyntax
#28 The morphosyntax of a language describes how the morphemes in a word affect its combinatoric potential
#29 Morphological features associated with verbs and adjectives (and sometimes nouns) can include information about tense, aspect and mood
#30 Morphological features associated with nouns can contribute information about person, number and gender
#31 Morphological features associated with nouns can contribute information about case
#32 Negation can be marked morphologically
#33 Evidentiality can be marked morphologically
#34 Definiteness can be marked morphologically
#35 Honorifics can be marked morphologically
#36 Possessives can be marked morphologically
#37 Yet more grammatical notions can be marked morphologically
#38 When an inflectional category is marked on multiple elements of sentence or phrase, it is usually considered to belong to one element and to express agreement on the others
#39 Verbs commonly agree in person/number/gender with one or more arguments
#40 Determiners and adjectives commonly agree with nouns in number, gender and case
#41 Agreement can be with a feature that is not overtly marked on the controller
#42 Languages vary in which kinds of information they mark morphologically
#43 Languages vary in how many distinctions they draw within each morphologically marked category

5. Syntax: introduction
#44 Syntax places constraints on possible sentences
#45 Syntax provides scaffolding for semantic composition
#46 Constraints ruling out some strings as ungrammatical usually also constrain the range of possible semantic interpretations of other strings

6. Parts of speech
#47 Parts of speech can be defined distributionally (in terms of morphology and syntax)
#48 Parts of speech can also be defined functionally (but not metaphysically)
#49 There is no one universal set of parts of speech, even among the major categories
#50 Part of speech extends to phrasal constituents

7. Heads, arguments and adjuncts
#51 Words within sentences form intermediate groupings called constituents
#52 A syntactic head determines the internal structure and external distribution of the constituent it projects
#53 Syntactic dependents can be classified as arguments and adjuncts
#54 The number of semantic arguments provided for by a head is a fundamental lexical property
#55 In many (perhaps all) languages, (some) arguments can be left unexpressed
#56 Words from different parts of speech can serve as heads selecting arguments
#57 Adjuncts are not required by heads and generally can iterate
#58 Adjuncts are syntactically dependents but semantically introduce predicates with take the syntactic head as an argument
#59 Obligatoriness can be used as a test to distinguish arguments from adjuncts
#60 Entailment can be used as a test to distinguish arguments from adjuncts
#61 Adjuncts can be single words, phrases, or clauses
#62 Adjuncts can modify nominal constituents
#63 Adjuncts can modify verbal constituents
#64 Adjuncts can modify other types of constituents
#65 Adjuncts express a wide range of meanings
#66 The potential to be a modifier is inherent to the syntax of a constituent
#67 Just about anything can be an argument, for some head

8. Argument types and grammatical functions
#68 There is no agreed upon universal set of semantic roles, even for one language; nonetheless, arguments can be roughly categorized semantically
#69 Arguments can also be categorized syntactically, though again there may not be universal syntactic argument types
#70 A subject is the distinguished argument of a predicate and may be the only one to display certain grammatical properties
#71 Arguments can generally be arranged in order of obliqueness
#72 Clauses, finite or non-finite, open or closed, can also be arguments
#73 Syntactic and semantic arguments aren't the same, though they often stand in regular relations to each other
#74 For many applications, it is not the surface (syntactic) relations, but the deep (semantic) dependencies that matter
#75 Lexical items map semantic roles to grammatical functions
#76 Syntactic phenomena are sensitive to grammatical functions
#77 Identifying the grammatical function of a constituent can help us understand its semantic role with respect to the head
#78 Some languages identify grammatical functions primarily through word order
#79 Some languages identify grammatical functions through agreement
#80 Some languages identify grammatical functions through case marking
#81 Marking of dependencies on heads is more common cross-linguistically than marking on dependents
#82 Some morphosyntactic phenomena rearrange the lexical mapping

9. Mismatches between syntactic position and semantic roles
#83 There are a variety of syntactic phenomena which obscure the relationship between syntactic and semantic arguments
#84 Passive is a grammatical process which demotes the subject to oblique status, making room for the next most prominent argument to appear as the subject
#85 Related constructions include anti-passives, impersonal passives, and middles
#86 English dative shift also affects the mapping between syntactic and semantic arguments
#87 Morphological causatives add an argument and change the expression of at least one other
#88 Many (all?) languages have semantically empty words which serve as syntactic glue
#89 Expletives are constituents that can fill syntactic argument positions that don't have any associated semantic role
#90 Raising verbs provide a syntactic argument position with no (local) semantic role, and relate it to a syntactic argument position of another predicate
#91 Control verbs provide a syntactic and semantic argument which is related to a syntactic argument position of another predicate
#92 In complex predicate constructions the arguments of a clause are licensed by multiple predicates working together
#93 Coordinated structures can lead to one-to-many and many-to-one dependency relations
#94 Long-distance dependencies separate arguments/adjuncts from their associated heads
#95 Some languages allow adnominal adjuncts to be separated from their head nouns
#96 Many (all?) languages can drop arguments, but permissible argument drop varies by word class and by language
#97 The referent of a dropped argument can be definite or indefinite, depending on the lexical item or construction licensing the argument drop

10. Resources
#98 Morphological analyzers map surface strings (words in standard orthography) to regularized strings of morphemes or morphological features
#99 'Deep' syntactic parsers map surface strings (sentences) to semantic structures, including semantic dependencies
#100 Typological databases summarize properties of languages at a high level

A. Grams used in IGT
Author's biography
General index
Index of languages.
Title from PDF t.p. (viewed on July 19, 2013).
Includes bibliographical references (p. 131-151) and index.
Other format:
Print version:
9781627050128 (electronic bk.)
9781627050111 (pbk.)
Publisher Number:
10.2200/S00493ED1V01Y201303HLT020 doi
