Towards a Language Consisting Entirely of Acronyms

This is a proposal for an entirely unique type of language.  There are two possible reasons for considering such a language.

  • It could be used for automatic summarization, a long-term goal of Artificial Intelligence.  It would be a genuine language, and machine translation techniques could translate a natural language like English into it.  But its abilities for automatic summarization and expansion would be useful for many  purposes.
  • It could be an international auxiliary language.

Human languages as spoken and written contain a great deal of redundancy.  In other communications media, redundant information can be thrown away, only to be approximately recovered later.  The best example of this is image compression.


Compare the two images above.  The one on the left is an uncompressed image of the Banff Springs Hotel in Alberta, Canada.  The second image is the result of a lossy compression using the JPEG algorithm, followed by its inverse expansion. The original image was compressed by a factor of almost 20, from 115,366 bytes to 5,937 bytes.  Almost 95% of the original data was thrown away, yet the reconstructed image is quite recognizable.

This example shows how good a data compression algorithm can be.  Linguistic data can also be compressed with a lossy compression algorithm, then reconstructed very well.  Does the compression have to be done by a sophisticated piece of software, like the JPEG algorithm used for images?

What about a much much simpler algorithm?

Imagine a language in which fronting is used uniformly, so that the first word in a phrase carries the most meaning. Similarly the first morpheme in a word may carry the most meaning.  Let morphemes be just phonemes represented by initial letters, then words in such a language could be abbreviated to their first letter without losing too much of its meaning.  A phrase could be abbreviated to a sequence of letters, forming a single word summary of it.

A language designed for data compression in this way can be called an acronymic language. A difficult problem faced by people working in natural language processing is text compression.  Various crude methods have been used to automatically summarize large sections of text into short paragraphs or abstracts.  In an acronymic language, this would be trivial.

But how good would it be?

To answer this question, one should look at the the inverse operation, actually a pseudoinverse.

If we are compressing words into letters, then each letter in word would have to expand into a word, and thus a whole word into a phrase or common sequence of words.  Let us suppose for the moment that this is to be done with software, which has access to a huge text database.

Expanding a word into a sequence by expanding its letters into words could be quite difficult if words (meanings) were assigned at random to different initial letters.  It could be much easier, if similar words were assigned to the same letters.  It could, on the other hand be almost impossible if words with similar meanings were distributed among different letters with some shuffling algorithm.

An example of assigning similar words to the same or similar letters would be the assignment of the common words often call “stop words” to the vowels.  The vowels are common, as are those words, and their role in word construction is somewhat similar to the role of those words in constructing phrases.

There are, of course a more than astronomical number of ways of assigning words to letters.   In some of those of assignments, decoding or inverting the compression using context would be almost impossible.  In others, it would be easy enough that people could learn to do it themselves, and indeed learn it as their native language.







Leave a Reply