Thoughts on computers and languages.
It’s all about languages and communication.
All data is expressed in some language: binary, assembly, C, JVM bytecodes, relational data structures, LISP s-expressions, JSON, XML, API calls, object models, user interfaces (icon languages, screen layouts, etc), or even human language (so-called “human entered text” or “raw text”).
Languages have dialects.
Each use of a language (in a particular program, used by a particular user) is a dialect of the main language. Dialects can refine.
Every use of language fall into a universal pattern. One actor wants to send a message. So, the actor takes the data that is in its mind and translates it into some language that will hopefully be understood by somebody other than itself. Then, the actor writes this message into some substrate that is accessible by other actor(s).
At some point later, another actor (maybe the same one) reads the data from the substrate and translates it into some data that is comprehensible and meaningful to the reading actor. There can be multiple receiving actors (multicasting).
That’s it. That’s pretty much everything computers ever do.
Note that actors could be: people, computers, processes, networks, threads, function or method activations, or even the same actor at various points in time.
The substrate can be: computer memory, human memory, hard disks, network signals on a wire, phosphor on a computer screen, light or sound travelling through the air.
The data is generally information of some sort.
The language could be: encoding in human neurons, HTML, a programming language, data structures or bits in memory, records and bytes on disk, a particular set of icons or visual cues in a user interface, colors, sounds, etc.
There are two broad uses of this general pattern:
1. writing: data at rest. Here the data is stored, and the actors tend to be separated substantially through time.
Stored data creates backwards compatibility problems: this is the same problem as written language being written with some assumptions and cultural ideas shifting around it, making it no longer “readable”. In the worst case, the language is lost and nobody can read it at all.
This can be rephrased as: translation is hard because time passes, and the world may change in ways that the old language is lost, misinterpreted, or forgotten.
Examples: databases, books, in memory data structures, generated reports, current state of a user interface, etc
Canonical example: data structures
2. spoken/conversational: data in motion. Here you are communicating between two parties. The actors are usually very close to each other (even realtime). Also, the messaging tends to be bidirectional: a message is sent, and then the receiver replies to the sender.
This is inherently a translation problem. In the best case, the two communicators know each other intimately and there is very little (not none) loss of information. In the worst case, the two communicators share very little common understanding, and only very limited information can pass between them.
Examples: API calls, queries + results, messaging, RPC, user actions in user interface
Canonical example: API call (which has a two-way messaging component: the passing of information TO the function activation, and the passing of the result BACK from the function activation, so maybe this isn’t a great canonical example.
These two uses are identical. Each of them is communication between two actors at some point in time. One actor translates its intent into some new language, and then writes this language into a substrate that the receiver has access to. Then, the receiver reads the data in this language and translates it into something meaningful for himself. Note that there are two levels of translation, and a storage aspect to all communication. What we usually call “data at rest” is just a mechanism by which we can communicate with actors (perhaps ourselves even) that live in the future.
Translation is pretty much always lossy. There are lossless translations (lossless compression, pig latin), but for the most part, you are translating meaning and not just structure (Tao Te Ching to english, Cerner to Epic)
Made worse by the fact that people using a particular system are actually creating their own dialect through use: “misusing” fields, stylized work habits: this leads to the “broken” data that needs periodic “cleanup”. It’s not dirty data: it’s just in a dialect that you don’t understand, with its own idioms, etc.
Standards are just language definitions. They don’t solve the problem.
Most languages are domain specific.
Computers, processes, threads, groups of processes are people (or groups/committees) of people. Communicating between two processes or two computers is analogous to communicating between a computer and a person.
Computers are people (single autonomous processing units). Processes are like alters in a DID sense - that is, they are more or less separate people.
Algorithms are weird. They basically tell the computer-people what to do, and how to interpret and transform and whatever. It’s like neurons in our heads. There is also languages, data, and translations, but here it’s describing LOGIC, which is basically what the computer-people should do. They generally are logic for manipulating stored data and sending messages in various languages.
What about AJAX and other “rich” UIs? They are a more complex beast.
What about static typing? This is an odd one. Basically, static typing (and, really, dynamic typing that fails at runtime when you violate constraints) is telling the computer what the rules of the language are. Hence, XSDs, relational schemas, Java static types, pre and post conditions, etc are all forms of typing. They prevent violations of the language when writing algorithms, to make writing algorithms easier.
Alternatives are: comprehensive unit tests.
This kind of typing (and strong contract definitions in general) are a good thing, in balance. When many people, or people from different organizations, etc, are going to write significant amounts of programs or data in a certain language, it is nice to be able to say: this was “good” or this was “bad”.
On the other hand, static typing can bloat the bejeezus out of any program that has to do anything with dynamic data (read: all of the big ones), including: reading comma delimited files, relational data, XML, build user interfaces, etc, etc.
Template languages? They’re all about language translation. Language translation generally consists of two parts: reading the data of relevance into some intermediate language that is easy for a program to deal with (say, loading relational data into an in-memory data structure that represents meaningful business objects), and then generating some target language (say HTML or some such). Templates are a nice way for humans to express things in output languages: the program itself (the template) has the same general “form” and “shape” as the generated output. Some compiler backends use templates, as do UI frameworks and obviously report writing and HTML generation.












