A friend of mine who was working on creating a language for a fantasy novel he was writing mentioned to me that the most difficult / tedious part of creating a language was coming up with the vocabulary. While some words are derivations or combinations of other words, much of the vocabulary needs to simply be made up.
He asked me if it was possible to write some code that would, based on a set of rules for what a word can be, generate random words. I thought it was an interesting idea and apart from uses in creative projects, it could possibly also be useful in generating secure, yet still memorable passwords, so I thought I'd give it a go.
First, I needed the code to decide how long of a word to make.
Given that I wanted to base the fictional language off of English, I chose to create two word length weight charts. One for the frequency of word lengths in the dictionary (which I took from a reddit post) and one for the frequency of word lengths as they are used in the English corpus (source).
const dictionaryDistribution: [number, number][] = [
[1, 52],
[2, 488],
[3, 1385],
[4, 3688],
[5, 6717],
[6, 10268], // etc…
];
const corpusDistribution: [number, number][] = [
[1, 0.03],
[2, 0.17],
[3, 0.21],
[4, 0.16],
[5, 0.11],
[6, 0.08], // etc…
];
The distribution is then mapped as numbers between 0 and 1 with the generateDistribution
function.
console.log(generateDistribution('corpus'));
/* Output
[
[ 1, 0.03 ],
[ 2, 0.2 ],
[ 3, 0.41000000000000003 ],
[ 4, 0.5700000000000001 ],
[ 5, 0.68 ],
[ 6, 0.76 ],
etc…
]
*/
And the word length randomly chosen:
/**
* Sample a word length from the distribution.
* @returns A random word length.
*/
export function getRandomWordLength(distribution?: [number, number][]) {
distribution = distribution || generateDistribution('dictionary');
const rand = Math.random();
for (const [length, prob] of distribution) {
if (rand < prob) return length;
}
return sampleLongTail();
}
Once the word length is determined, then the script can start building the words. The main function works by basically constructing a word, letter-by-letter / cluster-by-cluster, alternating between vowels and consonants, until it matches the determined length then returning the letter.
More specifically it works by looking at the word so far and determining which of the following arrays are viable options for what's next (with the 'marked' letters to be used sparingly):
const prefixes = ["str", "pre", "dia", "gh", "wh", "psy"];
const suffixes = ["tion", "ing", "ies", "ed", "er", "ght", "gh", "ck", "ff", "que", "nd"];
const vowels = ["a", "e", "i", "o", "u", "y"];
const consonants = [
...["b", "c", "d", "f", "g", "h", "j", "k", "l", "m"],
...["n", "p", "q", "r", "s", "t", "v", "x", "z", "w", "y"],
];
const marked = ["z", "x", "j"];
const consonantCluster = [
...["tr", "sc", "th", "sh", "ch", "br", "bl", "cl", "cr"],
...["ff", "que", "qu", "dr", "sw"],
];
const dipthong = ["ee", "ea", "io", "oo", "ou", "eau"];
After the initial fun of creating the words, I thought that the script could be useful as a tool for creating lorem ipsum like text. So I expanded it to include a function which created sentences and another which created paragraphs.
/**
* @description Generates a sentence with a given length or a normally distributed number.
* @param options - Either the given length of the sentence in words or an object containing full options.
* @returns The generated sentence.
*/
export const makeSentence = (options?: number | IpsumOptions): string => {
let length = typeof options === "number" ? options : options?.length;
const allOptions = typeof options === "object" ? options : {};
const distribution = allOptions.wordDistribution || generateDistribution("corpus");
const sentenceDistribution =
allOptions.sentenceDistribution || generateDistribution("sentence");
length = Math.max(1, length || getRandomLengthFromDistribution(sentenceDistribution));
let sentence = `${capitalizeFirstLetter(makeWord({ distribution }))} `;
for (let i = 1; i < length; i++) {
sentence += `${makeWord({ distribution })} `;
}
return `${sentence.trim()}.`;
};
/**
* @description Generates a paragraph with a given length or a normally distributed number.
* @param options - Either the given length of the paragraph in sentences or an object containing full options.
* @returns The generated paragraph.
*/
export const makeParagraph = (options?: number | IpsumOptions): string => {
let length = typeof options === "number" ? options : options?.length;
const allOptions = typeof options === "object" ? options : {};
const wordDistribution = allOptions.wordDistribution || generateDistribution("corpus");
const sentenceDistribution =
allOptions.sentenceDistribution || generateDistribution("sentence");
length = Math.max(1, length || Math.round(gaussianRandom(5, 1.2)));
let paragraph = "";
for (let i = 0; i < length; i++) {
paragraph += `${makeSentence({ wordDistribution, sentenceDistribution })} `;
}
return paragraph.trim();
};
You can create words and give them definitions at thedukeofnorfolk.com. I intend to, eventually, create a crowdsourced dictionary of fictional words.