Insidious Optimizations I: Machine Architecture

This is my first ``Insidious Optimizations'' article, and so I'll explain the basic concepts herein.

The idea of insidious optimizations is important to me, being a good reason I began writing articles
and whatnot.  An insidious optimization is such an optimization that ceases to be viewed as one.  An
insidious optimization is insidious in how it limits thought, especially to those thinking after one
is introduced.  An insidious optimization is such an optimization which is neither necessary nor was
necessarily always present with regards to its topic.  It limits thought by contributing unnecessary
constraints and assumptions; with an insidious optimization, people will simply not see alternations
or will even have difficulty understanding alternations could exist.  It stunts a mind.  There exist
many insidious optimizations in that young field of automatic computing, with this article detailing
those concerning machine design.

The prime insidious optimization I see is that of the register; registers cause issues for compilers
involving optimal allocation, implicit usage, the redundant instructions they make necessary, and in
the limited nature of their storage.  The former two issues work in tandem to make writing compilers
more complex than necessary; the third issue is the least egregious of the set; and that final issue
causes unnecessarily complicated memory hierarchies, through having small and very limited registers
act as the fastest memory available.

I've found great interest in the memory-to-memory model of machine architecture; such a model is the
least explored by others and I believe the most inherent, and so relatively lacking in any insidious
optimizations.  There is relatively no need to optimally allocate data as access is generally equal;
implicit usage can be enforced by the particular machine and in some cases should, but is less poor;
a memory-to-memory machine needs a lone instruction for moving any data as all is at an equal level;
and perhaps most importantly the fastest memory in such machines need no longer be poorly segmented.

Emphasizing the last point, it's not generally possible to store an important array across registers
of a machine, but a memory-to-memory machine faces no such issue.  Continuing from that third, other
instructions become unnecessary, including shift and control flow, while also exposing a more suited
interface for arbitrary-length operations through removing those register sizes as the common units.
Lacking registers, such a machine would do well to simply expose all state through the memory, which
would also have recovery and resumption made easier; a machine with the program counter exposed from
a memory location needs no special jump instruction, as this would merely be a move; I prefer such a
counter be placed at the zeroeth memory location.

I consider the issue of fast memory rather uninteresting, but a memory-to-memory machine can resolve
the issue nicely by providing the programmer with some control.  A cache memory not under good power
of the programmer is inherently poor, I think.  The programmer could merely have the machine replace
a range of memory with a suitably large amount of fast memory, transparently, so the only difference
is the speed with which the memory range is accessed; this nicely addresses the point that registers
serve as hints to the machine of what's valuable.  This mechanism, however, works for all data types
be they code, minor data, or large structures.

Another insidious optimization is the collecting of memory into fixed multiples of bits, commonly by
octet.  This decreases an address size by a mere three bits, at the cost of flexibility, and I'm led
to believe this loss of flexibility results in a net-loss of memory by making the optimal compacting
arduous in some cases.  An obvious example is needing only a single bit of storage, requiring either
wasting the other seven or writing code to collect only the relevant bit.  A less obvious example is
with alignment of structures; it would be possible to use the minimum amount of bits trivially, were
memory used bitwise, but in many cases this would be so arduous that it's not done, wasting whatever
space is used to make it convenient to access.  That extra code required would likely dwarf savings.

A memory-to-memory machine with bitwise memory trivially eliminates shifting instructions, through a
simple move or by merely changing an address with that relevant data having some space so set aside.
The issue of using conventional memory hardware is irrelevant, as required translations are trivial.

A prime question in the design of a memory-to-memory machine with bitwise memory would be which unit
is good to use as a base.  My solution is using the address length as such a unit, as this is rather
inherent to operation anyway and would do well where other fixed-size units are needed.

A prime disadvantage of the memory-to-memory design compared to others is the inherently larger size
of instructions, but some of my such designs have combatted this to nice results; the issue of large
instructions can be mitigated by making each instruction more capable, which may be aided by opening
space for more varied such instructions.  I find the design both fascinating and elegant; some early
automatic computers could be written to have used a variant.  The simplification through eliminating
instructions, ease of optimizing memory usage, and the ease through which optimizations can be added
transparently all suggest the design has merit.

It's a different insidious optimization to leave machine architecture dealing almost solely in terms
of numbers and not spanning to other abstract domains, but this will be detailed elsewhere.

I don't intend to mean other architectures are strictly inferior.  Each design has its quirks I find
fascinating.  Abandoned fields are fertile ground for novel research and those machine architectures
which deviate are rather certainly abandoned now; this article gives context for some of my efforts.
.