[HN Gopher] MiSTer, an open-source FPGA gaming project
       ___________________________________________________________________
        
       MiSTer, an open-source FPGA gaming project
        
       Author : tediousdemise
       Score  : 189 points
       Date   : 2021-04-11 17:59 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | fooblat wrote:
       | I love my MiSTer build!
       | 
       | The usb controllers made for the recent mini systems (NES, SNES,
       | Genesis, etc) make great accessories for the mister. Add a couple
       | of usb arcade sticks and you can really play almost any classic
       | retro games as it was meant to be played.
       | 
       | And then there are all the classic computer cores even including
       | the PDP-1!
        
         | vardump wrote:
         | Yup, same! Can wholeheartedly recommend it for those who want
         | something between emulation and real hardware.
         | 
         | The Amiga core is fun, AGA, 2 MB chip, 384 MB fast. It supports
         | hard disk images, so you can do a hard disk based Workbench
         | installation and load games and demos practically instantly
         | (and safely exit to Workbench) using WHDLoad.
         | 
         | Arcade cores are fun as well. Just like in childhood, but less
         | hungry for quarters. :) Recently played with arcade Gauntlet
         | core a bit for example.
        
       | xigency wrote:
       | Interesting.
       | 
       | I have a couple of FPGA boards on their way to me in the mail
       | which I intend to use for some homebrew video game projects.
       | Besides getting the development environment working, it can be
       | tricky outputting video from an FPGA because of the precise
       | timing involved. I will have to look through these resources to
       | see if there are any good tricks to use here.
        
       | phendrenad2 wrote:
       | MiSTer is an amazing phenomenon. The MiSTer itself is just an
       | Intel FPGA devkit, which many believe to be sold at a loss
       | (because it's a training tool and not Intel's main source of FPGA
       | revenue). The amazing thing is the aftermarket for addons. There
       | are many possible combinations of addon boards that add RAM with
       | deterministic latency, USB hubs, cooling fans, cases, retro
       | controller ports, etc. All custom-made for this ecosystem.
        
         | tinybear1 wrote:
         | It is definitely being sold at a loss, the Cyclone V SOC being
         | used costs more than the entire development board.[0] I wonder
         | if Intel will ever take notice due to MiSTer's growing
         | popularity and quit subsidizing the board.
         | 
         | [0]
         | https://www.digikey.com/en/products/detail/intel/5CSEBA6U23I...
         | 
         | Edit: it was erroneous of me to state the board was being sold
         | at a loss, rather I meant that the board was being definitely
         | being subsidized by companies such as Intel and their partners
         | such as Panasonic. My mistake. I also wasn't meaning to convey
         | that the consumer Digikey pricing was the same as the large
         | volume manufacturers such as Terasic. Rather I meant to
         | demonstrate and agree with the OP on the astounding situation
         | that MiSTer currently exists in, owning to the lack of economic
         | viability for someone to produce a low volume commercial FPGA
         | emulation machine for a niche audience without any
         | subsidization.
        
           | tverbeure wrote:
           | There is absolutely no way they're sold at a loss. Your
           | DigiKey price of $245 proves this, because a factor of 10 is
           | a good starting point as a ratio between volume and one-off
           | DigiKey pricing of any type of complex silicon.
           | 
           | A better way to approach this is as follows: what's the die
           | size of an FPGA like this? What's the production cost of the
           | die? Then check the historic gross margin percentage of FPGA
           | companies. Xilinx is around 68%, and that includes high-end
           | products which carry the highest markups, unlike this cookie
           | cutter thing.
           | 
           | That should give you a good ballpark number.
           | 
           | DigiKey charges what they do because nobody else is willing
           | to sell these things in low volume, and they have very high
           | inventory costs.
        
           | andrewcchen wrote:
           | Digikey pricing is not indicative of actual volume pricing,
           | especially for FPGAs where they are often many times
           | overpriced when buying from distributors. I doubt the board
           | is sold at an loss, probably sold at a small profit, not
           | that's it's really significant for a low volume dev board.
        
         | tverbeure wrote:
         | The price of a DE10-Nano is $135 ($115 for academic use.)
         | 
         | Anyone who thinks that Terasic sells these at a loss doesn't
         | have a clue about volume pricing of FPGAs. And as a special
         | Intel partner, there's little doubt that Terasic has access to
         | this kind of pricing.
        
       | Jorge1o1 wrote:
       | I don't really know much about game emulation so I was curious
       | about what differentiates this FPGA game project vs traditional
       | CPU emulation.
       | 
       | From their github page [1]:
       | 
       | >Traditional emulators on CPUs execute code sequentially. This is
       | a tricky method of emulation because real hardware has many chips
       | and all of them work in parallel...This requires a lot of CPU
       | power to emulate even an old and slow retro computer. Sometimes
       | even a modern CPU working at 100 times the speed of the retro
       | computer is not enough, so the emulator has to use approximation,
       | skip emulation of some less important parts, or assume some
       | standard work of the emulated system without extraordinary usage.
       | 
       | > FPGA doesn't need high frequencies to emulate retro computers;
       | it works at much lower frequencies than traditional emulators
       | require. Since everything in FPGA works in parallel, it is no
       | problem to handle any possible usage of the emulated system.
       | 
       | [1] https://github.com/MiSTer-devel/Main_MiSTer/wiki/Why-FPGA
       | 
       | (Edited for formatting)
        
         | TillE wrote:
         | byuu wrote a good article about this, unfortunately it's no
         | longer available, but basically it should be self-evident that
         | there's nothing _inherently_ more accurate about hardware
         | emulation.
         | 
         | If you've actually decapped the original chips and duplicated
         | them exactly in an FPGA, that's pretty cool. But otherwise it's
         | just another approximation. The lower power requirements are
         | nice, of course.
        
           | zokier wrote:
           | I think big differentiator is that it is easier to get
           | predictable latencies with FPGA where you control almost
           | everything, compared to general-purpose PC which is not
           | really that well optimized for hard real-time operation. So I
           | believe "race the beam" style things are more easily
           | accomplished with FPGAs, and also having tight audio-video
           | sync. Although the PC emulation scene has been also doing
           | some fairly incredible things too.
        
           | valec wrote:
           | you can find it here https://archive.is/fWosI
        
           | emodendroket wrote:
           | That's true, but I think it's also true that you could trim a
           | bit more lag if you do it well.
        
           | tyingq wrote:
           | Quite a few of the FPGA soft cores related to 8 bit gaming
           | are reverse engineered from either schematics, decapped
           | chips, or both. Or they take pains to at least use the same
           | number of cycles for each instruction, etc.
        
           | mbalyuzi wrote:
           | Actually decapping the original chips is very much a thing.
           | See for example Chris Smith's work mapping out the innards of
           | the ZX Spectrum ULA -
           | http://www.zxdesign.info/book/insideULA.shtml .
        
             | near wrote:
             | It is, but these cores are almost exclusively not being
             | done that way. Not yet at least. I hope that they will be,
             | that would be really awesome. I paid $1200 last year for
             | the SNES PPUs to be decapped for this purpose, but it's a
             | truly enormous undertaking to map out those chips and then
             | recreate it in Verilog. You're talking thousands of hours
             | of work per chip. If anyone reading this is able to help
             | with that effort, please do let me know, we could really
             | use the help.
        
               | tediousdemise wrote:
               | By decapped, do you mean delidded?
               | 
               | Theoretically it would be possible to automate this with
               | a couple things:
               | 
               | - USB electron microscope to image the transistor
               | topology
               | 
               | - CV lib to identify connections and generate
               | corresponding Verilog code
        
               | klodolph wrote:
               | "Decapping" is a more intense version of delidding where
               | you use chemical agents or something similarly extreme
               | (laser, plasma, milling) to remove the package (ceramic,
               | plastic).
               | 
               | My understanding is that there are people who do it often
               | enough that it is automated in the way you describe, but
               | you still need someone with a lot of skill to spend
               | serious time on it. Computer vision works wonders but
               | there are errors which must be identified and fixed.
               | 
               | A lot of the chips people care about are can just be done
               | optically, no electron microscope needed.
        
               | tediousdemise wrote:
               | Ah, that's a good distinction. I'd be pretty scared of
               | damaging the hardware by doing that, but I'm sure there
               | are some really experienced folks out there that would
               | appreciate the hardware donation.
        
               | FPGAhacker wrote:
               | Not that this is necessarily helpful to you in the short
               | term, but it strikes me as a good problem for machine
               | learning (going from die pictures to transistor
               | schematic.)
        
           | bcrl wrote:
           | That's exactly what's happening. There are loads of projects
           | going on right now decapping old chips and reverse
           | engineering them. From old CPUs like the 6502 to the Amiga
           | Alice chip. It's just a matter of time before most of the
           | retro systems are fully reverse engineered and documented.
        
           | tediousdemise wrote:
           | On FPGAs (depending on the hardware mapping), you get the
           | benefit of lower latency. I consider this to be timing
           | accuracy.
           | 
           | Say you have two implementations of an LED controlled by a
           | switch: one which uses an FPGA and one which uses a
           | microcontroller. The uC implementation must continuously poll
           | peripherals connected to its GPIO pins at a set frequency; it
           | must check the state of the switch, and then change the state
           | of the LED. The FPGA, on the other hand, _physically_ wires
           | the switch to the LED; there is no lag when the state of the
           | switch changes.
           | 
           | The FPGA implementation can be scaled to connect however many
           | additional lights and switches you want (limited by the size
           | of the fabric), with zero overhead lag. This is the
           | parallelization benefit of FPGAs that you may hear about. For
           | the uC implementation, you must add additional switches and
           | lights to the polling loop, which brings down performance in
           | linear time, O(n). This is the drawback of sequential
           | processing.
        
             | klodolph wrote:
             | Most game consoles don't do any of this, though. The
             | gamepad is polled by software.
             | 
             | On the NES and SNES, the buttons are connected to a shift
             | register (e.g. 4021). The CPU triggers a latch and then
             | reads out the shift register one bit at a time.
        
               | mikepurvis wrote:
               | This would be less about a user peripheral like the
               | gamepad (which is obviously going to be read out exactly
               | once per frame anyway) and more about getting subtle
               | interactions between the CPU, memory, and specialized
               | systems for graphics/audio correct. And not just correct
               | after thousands of hours of work to smoke out the exact
               | sources of specific title bugs, but correct essentially
               | for free.
               | 
               | See for example the tale of an absolutely wild mGBA
               | investigation that was posted here a while ago:
               | 
               | "What happens if an interrupt gets raised between
               | prefetch and the data load? Will it start prefetching the
               | interrupt vector before the invalid memory access? I
               | quickly mocked this up in mGBA, turned on interrupts in
               | the test ROM, and sure enough it broke out of the loop.
               | So I tried the same test ROM on hardware and...it did not
               | break out of the loop. So there goes that theory.
               | Eventually I realized something. You saw that asterisk
               | earlier I'm sure, so yes, there is one thing that can
               | happen in between prefetch and the memory access, but
               | only if the memory bus gets queried by something other
               | than the CPU between the prefetch and invalid memory
               | access."
               | 
               | https://mgba.io/2020/01/25/infinite-loop-holy-grail/
        
               | djmips wrote:
               | This was given as an example, not for you to straw man
               | about the gamepad.
        
         | rtkwe wrote:
         | There was a good article from Arstechnica a decade ago that
         | pointed out why you need so much more power to get perfect
         | emulation. To get exact emulation takes a lot of power because
         | there are a few games which use odd tricks that are hard to
         | document and precisely reimplement in software. FPGA emulation
         | gets around that by more directly emulating the hardware.
         | 
         | https://arstechnica.com/gaming/2011/08/accuracy-takes-power-...
        
       | dang wrote:
       | Related thread from 2018:
       | 
       |  _MiSTer: Run Amiga, SNES, NES and Genesis on an FPGA_ -
       | https://news.ycombinator.com/item?id=18721594 - Dec 2018 (30
       | comments)
        
       | hyperpl wrote:
       | I'd really like to see a portable/handheld leverage this
       | technology for on-the-go gaming.
        
         | craigjb wrote:
         | I built the Gameslab around this concept, but haven't worked on
         | it much lately.
         | 
         | https://craigjb.com/2019/11/26/gameslab-overview/
        
         | tediousdemise wrote:
         | The Analogue Pocket[1] is exactly this (albeit proprietary).
         | Out of the box it recreates GB, GBC, and GBA using the Altera
         | Cyclone-V platform.
         | 
         | [1] https://www.analogue.co/pocket
        
           | drewblaisdell wrote:
           | I wonder, why is there no DIY Analogue Pocket-style MiSTer
           | project? Is the DE10-Nano too large or inefficient for this?
        
             | jamespo wrote:
             | The limited market is problably covered with Odroid Go /
             | GPD XD / RG350M etc. Mister leverages an off the shelf FPGA
             | board that would require a lot more work in a handheld
             | form.
        
             | tediousdemise wrote:
             | I'd reckon it's the same reason that there isn't much of a
             | custom laptop scene. The open ended nature of stuffing a
             | screen, battery, and input peripherals into a chassis seems
             | an order of magnitude more difficult than just making a
             | headless box to plug into your TV.
             | 
             | But with some effort, it would be awesome.
        
               | jonny_eh wrote:
               | Physical design is also a lot more important. Getting
               | "feel" just right is very hard and expensive, especially
               | when it comes to game controllers.
        
             | duskwuff wrote:
             | The DE10-Nano itself is a bit large for a handheld device,
             | and hasn't been optimized for power consumption. (It's
             | designed as a development board, not as a component of a
             | finished product.) There's nothing stopping someone from
             | using the Cyclone-V SoC in a handheld device, though.
        
       | GekkePrutser wrote:
       | This isn't really new, right? I've heard of this years ago.
       | 
       | But it is an amazing project. Instead of emulating, they actually
       | rebuilt the old custom ICs (which 8-bit computers were full of)
       | in an FPGA. Really impressive.
        
         | jonny_eh wrote:
         | Old projects get reshared many times. It's always new to
         | someone.
        
         | tediousdemise wrote:
         | Yeah, it really is an amazing application for FPGAs--preserving
         | computing and gaming history. The list of cores available for
         | MiSTer is simply staggering:
         | 
         | > Computers - Classic
         | 
         | * Acorn Archimedes * Acorn Atom * Alice MC10 * Altair 8800 *
         | Amiga * Amstrad CPC 6128 * Amstrad PCW * ao486 (PC 486) *
         | Apogee * Apple I * Apple II+ * Apple Macintosh Plus * Aquarius
         | * Atari 800XL * Atari ST/STe * BBC Micro B,Master * BK0011M *
         | Color Computer 2, Dragon 32 * Commodore 16, Plus/4 * Commodore
         | 64, Ultimax * Commodore PET * Commodore VIC-20 * DEC PDP-1 *
         | EDSAC * Galaksija * Jupiter Ace * Laser 310 * MSX * MultiComp *
         | Orao * Oric 1 & Atmos * SAM Coupe * Sharp MZ Series * Sinclair
         | QL * Specialist/MX * TI-99/4A * TRS-80 Model 1 * TSConf *
         | Vector 06C * X68000 * ZX Spectrum * ZX Spectrum Next * ZX81
         | 
         | > Consoles - Classic
         | 
         | * Astrocade * Atari 2600 * Atari 5200 * Atari Lynx * AY-3-8500
         | * ColecoVision, SG-1000 * Gameboy, Gameboy Color * Gameboy
         | Advance * Genesis/Megadrive * SMS, Game Gear * MegaCD * NeoGeo
         | * NES * Odyssey2 * SNES * TurboGrafx 16 / PC Engine * Vectrex
         | 
         | > Other Systems
         | 
         | * Arduboy * Chess * CHIP-8 * Epoch Galaxy II * Flappy Bird *
         | Game of Life * TomyTronic Scramble
        
           | timbit42 wrote:
           | I'm still waiting for the KENBAK-1 core.
        
             | tediousdemise wrote:
             | Is there good documentation or ICDs out there that
             | adequately describe the architecture? Looks like there's
             | only 50 that were ever made, and only 14 believed to exist
             | today.
        
               | zokier wrote:
               | http://kenbakkit.com/manuals.html
               | 
               | Seems pretty well documented. Considering the simplicity
               | of the computer, feels like it would be relatively easy
               | project to get to MiST
        
           | pomian wrote:
           | Interesting project would be to dig out some old cassettes
           | from, let's say, commodore 64. Try to load them into a
           | present day computer by patching wires/cables? - and see if
           | they run in this system. I remember writing for example: a
           | mining program, to calculate, overburden, volume and tonnage,
           | at different slopes, different rock types, etc. The science
           | behind the calculations is still valid, but we could likely
           | increase load times, and calculating times.
        
         | near wrote:
         | It is indeed an amazing project, especially its open source
         | nature. It provides some impressive power savings and latency
         | reductions that are very hard to match with general purpose
         | CPUs.
         | 
         | But in most cases, it is emulation, as the lead developer will
         | attest.
         | 
         | https://github.com/MiSTer-devel/Main_MiSTer/wiki/Why-FPGA
         | 
         | "From my point of view, if the FPGA code is based on the
         | circuitry of real hardware (along with the usual tweaks for
         | FPGA compatibility), then it should be called replication.
         | Anything else is emulation, since it uses different kinds of
         | approximation to meet the same objectives. Currently, it's hard
         | to find a core that can truly be called a replica - most cores
         | are based on more-or-less functional recreations rather than
         | true circuit recreation. The most widely used CPU cores - the
         | Z80 (T80) and MC68000 (TG68K) - are pure functional emulations,
         | not replications. So it's okay to call FPGA cores emulators,
         | unless they are proven to be replicas."
         | 
         | But there's nothing wrong with emulation for preservation,
         | until we get to a point where we can wide-scale clone these
         | older chips down to the transistor level through analysis of
         | delayered decap scans. And even then, emulation will be useful
         | for artificial enhancements as well as for understanding how
         | all those transistors actually worked at a higher level.
         | 
         | It's also not a total solution: by taking many more transistors
         | to programmatically simulate just one, it limits the maximum
         | scale and frequency of what it can support. N64/PS1/Saturn has
         | not yet been fully supported and is still theoretical, but
         | likely, to be possible. Going beyond that is not possible at
         | this time.
         | 
         | Software emulation and FPGA devices should be seen as
         | complementary approaches, rather than competitive. The
         | developers of each often work together, and new knowledge is
         | mutually beneficial.
        
           | floatboth wrote:
           | Well, yeah, it's not replication if it's not an exact
           | hardware replica, but the word "emulation" has very
           | "software" connotations. I guess let's call it.. recreation?
           | (That word is even in the quote above!)
        
             | someperson wrote:
             | "FPGA re-implementation" may be a better term
        
           | jamespo wrote:
           | So it's not perfect but it's better than emulators...
        
             | near wrote:
             | In latency and power usage, yes. In compatibility and
             | accuracy, no. Both are Turing complete, so there's nothing
             | you can do with one that you can't do with the other.
             | 
             | If you take the SNES core, my software emulator has 100%
             | compatibility and no known bugs, and synchronizes all
             | components at the raw clock cycle level. It also mitigates
             | most of the latency concern through a technique known as
             | run-ahead. But it does require more power to do this.
        
             | stormbrew wrote:
             | I'm really curious where you got "better" out of the quoted
             | text. Because it's not there or implied, but people keep
             | reading this into anything about fpga recreations of chips.
             | There's nothing inherently better about doing emulation on
             | an fpga or a cpu, other than basically the amount of
             | electricity involved in doing it.
             | 
             | But people keep presuming an improved accuracy that there's
             | no basis for.
        
               | emodendroket wrote:
               | Probably the marketing copy for Super NT and similar
               | products... harder to get people to part with hundreds of
               | dollars if your pitch is "lower power draw and reduced
               | input delay"
        
               | cmrdporcupine wrote:
               | Lower latency is definitely a thing. With FPGA it's
               | possible to 'chase the beam' like the original hardware,
               | and have much reduced input latency from devices, etc.
               | With an emulator you're going to be fighting the OS and
               | the frameworks you built on top of. Even if you go "bare
               | metal" (like my friend's BMC64 project which runs a C64
               | emulator like a unikernel on the RPi with no OS) you are
               | still dealing with hardware built for usage patterns very
               | different from the classic systems. You're always going
               | to be one or more frames behind.
        
               | near wrote:
               | That is true. There are however techniques software
               | emulators can use like run-ahead that can get you lower
               | latency than even the original hardware on a PC:
               | https://near.sh/articles/input/run-ahead
               | 
               | The caveat is that it doesn't _always_ work, and it makes
               | the power requirements even more unbalanced. Some might
               | also see it as a form of cheating to go below the
               | original game 's latency. If you want to match the
               | original game's latency precisely, FPGAs are the way to
               | go right now for sure.
        
               | tediousdemise wrote:
               | Run-ahead seems pretty cool, great technical write up.
               | How would you compare this to the feature called frame-
               | skipping that I often see implemented in software
               | emulators?
        
           | mschuster91 wrote:
           | > It's also not a total solution: by taking many more
           | transistors to programmatically simulate just one, it limits
           | the maximum scale and frequency of what it can support.
           | N64/PS1/Saturn has not yet been fully supported and is still
           | theoretical, but likely, to be possible. Going beyond that is
           | not possible at this time.
           | 
           | The limiting factor here is the amount of stuff you can throw
           | into a single FPGA, correct?
           | 
           | So in theory, shouldn't it be possible to tie a bunch of
           | FPGAs together, with two beefy ones being responsible for
           | replicating CPU / GPU functionality, a couple smaller ones
           | for sound and other "helper" processors, and some bog-
           | standard ARM SoC to provide the bitstreams to the FPGAs and
           | emulate storage (game cartridges, save cards) and input
           | elements (mainly "modern" controllers)?
        
             | near wrote:
             | There's both a cost and a speed barrier to it. FPGAs are
             | often used to design, simulate, and test modern circuits at
             | sub-realtime speeds. No amount of FPGAs will get you a PS2
             | emulator at playable speeds right now, let alone a
             | PS3/Switch emulator. PCs can do that today by taking
             | shortcuts such as dynamic recompilation and idle loop
             | skipping.
        
               | vardump wrote:
               | Hmm... looking at the frequencies and gate counts, I
               | think PS2 is well within realm of possibility to run on a
               | not-so-cheap FPGA (or several). But PS3 generation
               | consoles definitely not.
        
             | duskwuff wrote:
             | > The limiting factor here is the amount of stuff you can
             | throw into a single FPGA, correct?
             | 
             | And the speed that you can get your design to run at.
             | Something like the Game Cube (PPC750 @ 485 MHz) would be
             | difficult to implement in an FPGA, for example.
        
           | GekkePrutser wrote:
           | Ah ok I wasn't aware of this. I thought it was spot on.
           | 
           | And yeah I hope we can easily order small batches of ICs (at
           | big pitch of course) in a few years, in a similar way to how
           | creating PCBs has become so simple now.
           | 
           | I mean I remember how much of a PITA it was in the 80s.
           | Drawing on overhead sheets. All the acids and other
           | chemicals. Drilling. And now we get super-accurate 10x10cm
           | boards dual-layer, drilled, soldermasked and silkscreened for
           | a buck a pop with a minimum of 10. Wow. I really hope this
           | trend continues down to the scale of ICs (or that FPGAs
           | simply get better/easier).
           | 
           | By the way, emulating a CPU is pretty easy and very accurate
           | anyway. The big problem with accurate emulation is with some
           | of the peripheral ICs which used hard to emulate stuff like
           | analog sound generators.
        
       ___________________________________________________________________
       (page generated 2021-04-11 23:00 UTC)