A friend of mine who was working on creating a language for a fantasy novel he was writing mentioned to me that the most difficult / tedious part of creating a language was coming up with the vocabulary. While some words are derivations or combinations of other words, much of the vocabulary needs to simply be made up.
He asked me if it was possible to write some code that would, based on a set of rules for what a word can be, generate random words. I thought it was an interesting idea and apart from uses in creative projects, it could possibly also be useful in generating secure, yet still memorable passwords, so I thought I'd give it a go.
First, I needed the code to decide how long of a word to make.
Given that I wanted to base the fictional language on English, I chose to create two word-length weight charts: one for the frequency of word lengths in the dictionary (which I took from a Reddit post) and one for the frequency of word lengths as they are used in the English corpus (source).
const dictionaryDistribution: [number, number][] = [
[1, 52],
[2, 488],
[3, 1385],
[4, 3688],
[5, 6717],
[6, 10268], // etc…
];
const corpusDistribution: [number, number][] = [
[1, 0.03],
[2, 0.17],
[3, 0.21],
[4, 0.16],
[5, 0.11],
[6, 0.08], // etc…
];
The distribution is then mapped as numbers between 0 and 1 with the generateDistribution function.
console.log(generateDistribution('corpus'));
/* Output
[
[ 1, 0.03 ],
[ 2, 0.2 ],
[ 3, 0.41000000000000003 ],
[ 4, 0.5700000000000001 ],
[ 5, 0.68 ],
[ 6, 0.76 ],
etc…
]
*/
And then a word length is randomly chosen:
/**
* Sample a word length from the distribution.
* @returns A random word length.
*/
export function getRandomWordLength(distribution?: [number, number][]) {
distribution = distribution || generateDistribution('dictionary');
const rand = Math.random();
for (const [length, prob] of distribution) {
if (rand < prob) return length;
}
return sampleLongTail();
}
Once the word length is determined, the script can start building words. The main function works by constructing a word, letter-by-letter / cluster-by-cluster, alternating between vowels and consonants until it matches the determined length, then returning the result.
More specifically, it works by looking at the word so far and determining which of the following arrays are viable options for what's next (with the 'marked' letters used sparingly):
const prefixes = ["str", "pre", "dia", "gh", "wh", "psy"];
const suffixes = ["tion", "ing", "ies", "ed", "er", "ght", "gh", "ck", "ff", "que", "nd"];
const vowels = ["a", "e", "i", "o", "u", "y"];
const consonants = [
...["b", "c", "d", "f", "g", "h", "j", "k", "l", "m"],
...["n", "p", "q", "r", "s", "t", "v", "x", "z", "w", "y"],
];
const marked = ["z", "x", "j"];
const consonantCluster = [
...["tr", "sc", "th", "sh", "ch", "br", "bl", "cl", "cr"],
...["ff", "que", "qu", "dr", "sw"],
];
const dipthong = ["ee", "ea", "io", "oo", "ou", "eau"];
After the initial fun of creating the words, I thought that the script could be useful as a tool for creating lorem-ipsum-like text. So I expanded it to include a function that creates sentences and another that creates paragraphs.
/**
* @description Generates a sentence with a given length or a normally distributed number.
* @param options - Either the given length of the sentence in words or an object containing full options.
* @returns The generated sentence.
*/
export const makeSentence = (options?: number | IpsumOptions): string => {
let length = typeof options === "number" ? options : options?.length;
const allOptions = typeof options === "object" ? options : {};
const distribution = allOptions.wordDistribution || generateDistribution("corpus");
const sentenceDistribution =
allOptions.sentenceDistribution || generateDistribution("sentence");
length = Math.max(1, length || getRandomLengthFromDistribution(sentenceDistribution));
let sentence = `${capitalizeFirstLetter(makeWord({ distribution }))} `;
for (let i = 1; i < length; i++) {
sentence += `${makeWord({ distribution })} `;
}
return `${sentence.trim()}.`;
};
/**
* @description Generates a paragraph with a given length or a normally distributed number.
* @param options - Either the given length of the paragraph in sentences or an object containing full options.
* @returns The generated paragraph.
*/
export const makeParagraph = (options?: number | IpsumOptions): string => {
let length = typeof options === "number" ? options : options?.length;
const allOptions = typeof options === "object" ? options : {};
const wordDistribution = allOptions.wordDistribution || generateDistribution("corpus");
const sentenceDistribution =
allOptions.sentenceDistribution || generateDistribution("sentence");
length = Math.max(1, length || Math.round(gaussianRandom(5, 1.2)));
let paragraph = "";
for (let i = 0; i < length; i++) {
paragraph += `${makeSentence({ wordDistribution, sentenceDistribution })} `;
}
return paragraph.trim();
};
You can create words and give them definitions at thedukeofnorfolk.com. I intend to eventually create a crowdsourced dictionary of fictional words.