Morphotactics
The most important goal in structuring the lexicon for Novvocu is to have it always be obvious when a word is a compound form or a root form. To this end, all free morphemes begin and end with consonants and exclude consonant clusters from the beginning of syllables. Root words are of the approximate form C[V(V)C]V(V)C(C), though in actuality no word is permitted to have more than one occurence of two vowels in a row (e.g., CVCVVC is permitted but not CVVCVVC). The following patterns are permitted:
CVC
CVCC
CVVC
CVVCC
CVCVC
CVCVCC
CVVCVC
CVCVVC
CVVCVCC
CVCVVCC
CVCVCVC
CVCVCVCC
CVVCVCVC
CVVCVCVCC
CVCVVCVC
CVCVVCVCC
CVCVCVVC
CVCVCVVCC
A root like *plant is not permitted, since initial consonant clusters are not allowed, and the word 'campus' /kahm-POOS/ could not be a root but would be a compound of the roots 'cam'+'pus' ("shirt" + "usage"). The word patterns make it easy to tell where one root begins and one ends (for instance, 'duvsir' is clearly 'duv'+'sir' and 'mentvoc' is clearly 'ment'+'voc'). All final vowels make up their own morpheme, marking the part of speech.
The set of consonants was specifically chosen so that each consonant can occuy any place within a morpheme: initial, medial or final.
While the word patterns ("morphotactics") of Novvocu may seem artificial, many languages have much greater restrictions on possible word forms than Novvocu does. For instance, Polynesian languages typically allow only V and CV syllables; Chinese syllables are typically CV or CVN. This means that when roots are borrowed into these languages, they undergo a lot of change, such as when English 'pocket monster' becomes Japanese 'pokemon'. In Novvocu, English 'plant' (from Latin) is present as 'palt', since both *plant and *palnt are invalid roots.
Which brings us to the primary design tension of Novvocu. On the one hand, root words must fit strict syllable patterns, but on the other hand root words should be as recognizable as possible to speakers of any language.
Recognizable Forms
Since it was not feasible to analyze thousands of languages for common forms, Novvocu focused in on words from six of the most spoken languages in the world: Arabic, Chinese, English, Hindi, Russian and Spanish (called the six cardinal languages). The primary source of these natural-language words was the Lojban etymological dictionary, which presented phonetic information of about 1200 words. Where possible, forms in other languages were also considered, especially forms in German, Dutch, Italian, Esperanto and Novial, as derived from the Universal Language Dictionary. Occasionally Proto-Indo-European forms are considered as well, given that they survive today in widely spoken languages.
How recognizable are Novvocu words? It is rare that you will have a Novvocu word like 'motor' /moh-TOHRR/, "motor", which -- as a technical term derived from Latin -- has found its way into all the cardinal languages (though, in Mandarin Chinese, it takes the form /mada/). More typical is something like 'cafaz', "jump", from the Arabic /kafaz/, a form which won out because it fit the word structure of Novvocu best and because its initial /k-/ was reinforced by Hindi /kud/. Matching the initial sound was considered quite important, as it has been demonstrated to be a strong mnemonic, and a high correspondence of word-initial sounds from Novvocu to the speaker's native tongue makes Novvocu sound "more natural".
While early attempts were made to systemize word formation, these methods were rejected and it was done on ad hoc basis. The priority was to take any form more or less as is, if it were present in two of the cardinal languages. If it were a particular high-frequency form, it might be truncated to one syllable, such as 'per' from Latin 'persona', extant in Romance (Spanish, Italian, et al), Germanic (English, German, et al) and Russian, and reinforced by Hindi /puruc/. If no forms matched, but some matched on an initial letter, one of those forms was chosen. Sometimes, when the most common form begins with a vowel, the initial consonant of a word in another language is used to start the word (for instance, 'boct', "eight", from Chinese /ba/ and Romance 'octo-').
In some cases, conflicts with other words changed the available form: 'cat' in Novvocu means "cut" as this form is supported in more cardinal languages than the form 'cat' for "cat"; therefore something longer than /kat/ was needed for "feline" and the selection was 'catoh', taking the /-osh/ from Russian /koshk/, with the -ato- reinforced by the Romance form 'gat(t)o'.
While word forms could have been generated randomly by computer, looking to natural languages for inspiration provided some needed realism to the language -- and makes remembering the vocabulary a little easier, especially for English speakers.
|