[HN Gopher] Clip control on the Apple GPU
       ___________________________________________________________________
        
       Clip control on the Apple GPU
        
       Author : stefan_
       Score  : 207 points
       Date   : 2022-08-22 14:04 UTC (8 hours ago)
        
 (HTM) web link (rosenzweig.io)
 (TXT) w3m dump (rosenzweig.io)
        
       | bla3 wrote:
       | > Here's a little secret: there are two graphics APIs called
       | "Metal". There's the Metal you know, a limited API that Apple
       | documents for App Store developers, an API that lacks useful
       | features supported by OpenGL and Vulkan.
       | 
       | > And there's the Metal that Apple uses themselves, an internal
       | API adding back features that Apple doesn't want you using.
       | 
       | Apple does stuff like this so much and gets so little flak for
       | it.
       | 
       | I use macOS since it seems like the least bad option of you want
       | a Unix but also don't want to spend a lot of time on system
       | management, but this is a real turn-off.
        
         | bri3d wrote:
         | > Apple does stuff like this so much and gets so little flak
         | for it.
         | 
         | Why should they get flak for having internal APIs? The fact
         | that the internal API is a superset of the external API is
         | smart engineering.
         | 
         | Think about it this way: Apple could just as well have made the
         | "Metal that Apple uses themselves" some arcane "foocode" IR
         | language or something, as I'm sure many shader compilers and
         | OpenGL runtime implementations do, and nobody would be nearly
         | as mad about it.
         | 
         | The fact that they use internal APIs for external apps in their
         | weird iOS walled garden is obnoxious, but having private,
         | undocumented APIs in a closed-source driver is not exactly an
         | Apple anomaly.
        
           | LeifCarrotson wrote:
           | > Why should they get flak for having internal APIs? The fact
           | that the internal API is a superset of the external API is
           | smart engineering.
           | 
           | It's not about having good segmentation of user-facing and
           | kernel-side libraries, no one faults them for that.
           | 
           | It's about Apple building user-facing apps that use the whole
           | API, and then demanding that other developers not use the
           | features required to implement those apps because we're not
           | trusted to maintain the look-and-feel, responsiveness, or
           | battery life expectations of apps on the platform.
        
             | dcx wrote:
             | But isn't it kind of fair to say that when you look at the
             | case studies presented by (a) the Android app store in the
             | past decade and (b) Windows malware in the decade before
             | that, this trust has in fact not been earned?
             | 
             | I hate a walled garden as much as the next developer, and
             | the median HN reader is probably more than trustworthy. But
             | past performance does predict future performance.
        
         | fezfight wrote:
         | If you buy a hackintosh, you have to sometimes mess around to
         | get stuff to work. Same goes for Linux on random hardware. If
         | you check first and buy a machine that supports the OS you're
         | using, you don't have to do anything special. It'll work as you
         | expect.
         | 
         | It's freeing not to be beholden to the likes of someone like
         | Tim Cook who, it would seem, spends the majority of his waking
         | hours figuring out how to hide anticonsumer decisions under
         | rugs.
        
         | gjsman-1000 wrote:
         | > Apple does stuff like this so much and gets so little flak
         | for it.
         | 
         | To be fair, Windows has a _ludicrous_ amount of undocumented
         | APIs for internal affairs as well, and you can get deep into
         | the weeds very quickly, just ask the WINE Developers who have
         | to reverse-engineer the havoc. There is no OS without Private
         | APIs, but Windows is arguably the worst with more Private or
         | Undocumented APIs than Apple.
         | 
         | This actually bears parallels to Metal. Until DirectX 12,
         | Windows had no official way to get low-level. Vulkan and OpenGL
         | are only 3rd-party supported, not Microsoft-supported,
         | Microsoft officially only supports DirectX. If you want
         | Vulkan/OpenGL, that's on your GPU vendor. If you wanted low-
         | level until 12, you _may_ have found yourself pulling some
         | undocumented shenanigans. Apple hasn 't gotten to their DirectX
         | 12 yet, but they'll get there eventually.
         | 
         | As for why they are Private, there could be many reasons, not
         | least of which that (in this case) Apple has a very complicated
         | Display Controller design and is frequently changing those
         | internal methods, which would break compatibility if third-
         | party applications used them. Just ask Asahi about how the DCP
         | changed considerably from 11.x to 13.x.
        
           | chongli wrote:
           | _Apple has a very complicated Display Controller design_
           | 
           | Can anyone in the know give more information here? Why would
           | Apple want to do this? What could they be doing that's so
           | complicated in the display controller?
        
             | gjsman-1000 wrote:
             | https://twitter.com/marcan42/status/1549672494210113536
             | 
             | and
             | 
             | https://twitter.com/marcan42/status/1415360411260493826?lan
             | g...
             | 
             | and
             | 
             | https://twitter.com/marcan42/status/1526104383519350785
             | 
             | As to why? Well, if it ain't broke don't fix it from
             | iPhone, but it is still a bit of a mystery.
             | 
             | In a nutshell from those threads:
             | 
             | 1. Apple's DCP silicon layout is actually massive,
             | explaining the 1 external display limit
             | 
             | 2. Apple implements half the DCP firmware on the main CPU
             | and the other half on the coprocessor with RPC calls, which
             | is hilariously complicated.
             | 
             | 3. Apple's DCP firmware is versioned, with a different
             | version for every macOS release. This is also why Asahi
             | Linux currently uses a "macOS 12.3" shim, so they can focus
             | on the macOS 12.3 DCP firmware in the driver, which will
             | probably not work with the macOS 12.4+ DCP firmware or the
             | macOS 12.2- firmware.
             | 
             | I can totally see why Apple doesn't want people using their
             | low-level Metal implementation that deals with the mess
             | yet.
        
               | chongli wrote:
               | Yeah it makes perfect sense that they don't want to
               | expose any of that complexity to 3rd parties and risk
               | constant breakage with new models. I'm just really
               | curious about what sort of complex logic they have going
               | on in that silicon.
        
               | phire wrote:
               | The complexity with the firmware split across the main
               | CPU and a coprocessor seems to be a historical artefact.
               | 
               | Seems the DCP driver was originally all on the main CPU,
               | and when apple got these cheap coprocessor cores, they
               | took a lazy approach of just inserting a simple RPC layer
               | in the middle. The complexity for Asahi comes from the
               | fact that it's a c++ API that can change very dynamically
               | from version to version.
               | 
               | And yes, these ARM coprocessor cores are cheap, apple
               | have put at least 16 of them [1] on the M1, on top the 4
               | performance and 4 efficiency cores. They are an apple
               | custom design that implement only the 64bit parts of the
               | ARMv8 spec. I'm not entirely sure why the actual DCP is
               | so big, but it's not because of the complex firmware.
               | Potentially because the DCP includes enough dedicated RAM
               | to store an entire framebuffer on-chip.
               | 
               | If so, they will be doing this because it allows for
               | lower power consumption. The main DRAM could be put in a
               | power-saving mode and kept there for seconds or even
               | minutes at a time without having to wake it up multiple
               | times per frame, even when just showing a static image.
               | 
               | [1]
               | https://twitter.com/marcan42/status/1557242428876537856
        
               | throwaway08642 wrote:
               | @marcan42 said that on the M1 MacBook Pro models, the DCP
               | also implements hardware-level antialiasing for the notch
               | and rounded display corners.
        
         | Pulcinella wrote:
         | _Apple does stuff like this so much and gets so little flak for
         | it._
         | 
         | It would be one thing if the private APIs were limited to
         | system frameworks and features while Apple's own apps weren't
         | allowed to use them, but they do. E.g. The Swift Playgrounds
         | app for iPad is allowed to share and compile code, run separate
         | processes, etc. which isn't normally allowed in the AppStore.
         | They also use blur and other graphical effects (outside of the
         | background blur material and the SwiftUI blur modifier) that
         | are unavailable outside of private APIs.
         | 
         | It stinks because of the perceived hypocrisy and the inability
         | to compete on a level playing field or leave the AppStore (and
         | I say this as someone who normally doesn't mind the walled
         | garden!)
        
           | adrian_b wrote:
           | Unfortunately such a behavior is not at all new.
           | 
           | The best known example of these methods is how Microsoft has
           | exploited the replacement of MS-DOS with Windows 3.0 and
           | especially with Windows 95.
           | 
           | During the MS-DOS years, the only Microsoft software products
           | that were successful were their software development tools,
           | i.e. compilers and interpreters, and even those had strong
           | competition, mainly from Borland. Those MS products addressed
           | only a small market and they could not provide large
           | revenues. The most successful software products for MS-DOS
           | were from many other companies.
           | 
           | That changed abruptly with the transition to various Windows
           | versions, when the Microsoft developers started to have a
           | huge advantage over those from any other company, both by
           | being able to use undocumented internal APIs provided by the
           | MS operating systems and also by knowing in advance the
           | future documented APIs, before they were revealed to
           | competitors.
           | 
           | Thus in a few years MS Office has transitioned from an
           | irrelevant product, much inferior to the competition, to the
           | dominant suite of office programs, which has eliminated all
           | competitors and which has become the main source of revenue
           | for MS.
        
         | [deleted]
        
           | [deleted]
        
         | Jasper_ wrote:
         | As a graphics engineer, good riddens to the old clip space,
         | 0...1 really is the correct option. We also don't know what
         | else "OpenGL mode" enables, and the details of what it does
         | probably changes between GPU revisions -- the emulation stack
         | probably has the details, and changes its own behavior of
         | what's in hardware and what's emulated in the OpenGL stack
         | depending on the GPU revision.
         | 
         | Also, to Alyssa, if she's reading this: you're just going to
         | have to implement support shader variants. Build your
         | infrastructure for supporting them now. It's going to be far
         | more helpful than just for clip control.
         | 
         | But yes, the Vulkan extension was just poorly specified,
         | allowing you to change clip spaces between draws in the same
         | render pass is, again, ludicrous, and the extension should just
         | be renamed VK_EXT_i_hate_tilers (like so many others of their
         | kind). Every app is going to set it at app init and forget it;
         | the implementation using the render pass bit and flushing on
         | change will cover the 100% case, and won't be slow at all.
        
           | garaetjjte wrote:
           | >good riddens to the old clip space, 0...1 really is the
           | correct option
           | 
           | More like 1...0, which nicely improves depth precision.
           | Annoyingly due to symmetric -1...1 range reverse-Z cannot be
           | used on OpenGL out of the box, but it can be fixed with
           | ARB_clip_control. https://developer.nvidia.com/content/depth-
           | precision-visuali...
        
           | bpye wrote:
           | > you're just going to have to implement support shader
           | variants
           | 
           | I admittedly have zero experience with Mesa, but it seems
           | like shader variants is something that should be common
           | infrastructure? Though of course the reason that a variant is
           | needed would be architecture specific.
        
         | chaxor wrote:
         | This is why the Asahi Linux project is so exciting!! You get
         | the great performance at low-power (M* ARM processors) while
         | still getting the more performant and useful Linux experience.
         | 
         | I am really thankful to the Asahi Linux team, and specifically
         | in this instance for the GPU, [Alyssa
         | Rosenweig](https://github.com/alyssarosenzweig), [Asahi
         | Lina](https://github.com/asahilina), and [Doug all
         | Johnson](https://github.com/dougallj).
        
       | rowanG077 wrote:
       | Amazing that this works because of the herculean effort of just a
       | handful of people.
        
       | hrydgard wrote:
       | There is no good reason to flip the flag dynamically at runtime
       | and apps just don't do that, so flushing the pipeline should be
       | perfectly fine, even in an implementation of the clip control
       | extension.
        
       | gjsman-1000 wrote:
       | Optimistic that OpenGL 2.1 will be available by the end of the
       | year on Asahi - well that is news. It's only 2.1, but that's
       | enough (as stated) for a web browser, desktop acceleration, and
       | old games.
       | 
       | Also RIP all the countless pessimistic "engineers" here and
       | elsewhere saying we'd be waiting for years more for _any_
       | graphics acceleration.
       | 
       | Edit: It is true though that AAA Gaming will wait: "Please temper
       | your expectations: even with hardware documentation, an optimized
       | Vulkan driver stack (with enough features to layer OpenGL 4.6
       | with Zink) requires over many years of full time work. At least
       | for now, nobody is working on this driver full time. Reverse-
       | engineering slows the process considerably. We won't be playing
       | AAA games any time soon."
       | 
       | Still, even if that be the case, accelerated desktop is an
       | accelerated desktop, much sooner than many expected.
        
         | smoldesu wrote:
         | It's pretty insane that OpenGL 2.1 is even functional on a GPU
         | this strange, but remember; this is still an unfinished, hacky
         | implementation (the author's own concession). Plus, you're
         | going to be stuck on x11 until any serious GPU drivers get
         | written, which in many people's opinion is just as bad as no
         | hardware acceleration at all. No MacOS-like trackpad gestures
         | either, you'll be waiting for Wayland support to get that too.
         | It'll definitely be a boon for web browsing though, so I won't
         | deny that. What I'm _really_ curious about is older WINE titles
         | with Box86, if you could get DOS titles like Diablo 2 running
         | smoothly, it could probably replace my Switch as a portable
         | emulation machine...
        
           | gjsman-1000 wrote:
           | > pretty insane that OpenGL 2.1 is even functional on a GPU
           | this strange,
           | 
           | Well... you were one of the most vocal critics saying it
           | wouldn't happen anytime soon.
           | 
           | > unfinished, hacky implementation (the author's own
           | concession)
           | 
           | Still more stable than Intel's official Arc drivers, so who
           | defines "hacky"? ;)
           | 
           | > Plus, you're going to be stuck on x11 until any serious GPU
           | drivers get written
           | 
           | Only because it is running on macOS, which supports X11 but
           | not Wayland. On Linux, Wayland or X11 will both work, no
           | problem.
           | 
           | > No MacOS-like trackpad gestures either, you'll be waiting
           | for Wayland support to get that too
           | 
           | Again, Wayland will work on Day 1, it's just a limitation of
           | running the driver on macOS until the kernel support is
           | ready. When it is on Linux, Wayland will be a full-go.
        
             | smoldesu wrote:
             | > Well... you were one of the most vocal critics saying it
             | wouldn't happen anytime soon.
             | 
             | Yep. Been beating that drum since 2020, looks history
             | proved me right on this one.
             | 
             | > Still more stable than Intel's official Arc drivers, so
             | who defines "hacky"? ;)
             | 
             | Apparently not me, I had no idea that the M1 supported
             | Vulkan and DirectX 12.
        
           | viraptor wrote:
           | > if you could get DOS titles like Diablo 2
           | 
           | Did you mean some other title? Even diablo 1 was a Windows
           | game.
        
           | [deleted]
        
           | kirbyfan64sos wrote:
           | I'm a bit confused as to why X11/wayland would be a huge
           | issue here? The Mesa docs do say X11-only, but they're
           | referring to running the driver on macOS (hence the XQuartz
           | reference), where Wayland basically doesn't exist.
        
             | smoldesu wrote:
             | Ah, looks like I definitely missed that.
             | 
             | In any case, I don't think Asahi/M1 has proper KWin or
             | Mutter support yet. It's still going to take a while before
             | you get a truly smooth desktop Linux experience on those
             | devices, but some hardware acceleration is definitely
             | better than none!
        
           | rowanG077 wrote:
           | I mean the signs were clear for basically one and a half
           | years now. It was never a question of if. But a question of
           | when. There were just so many voices that didn't know what
           | they were talking about. Comparing it to nouveau for example.
        
           | Miraste wrote:
           | Why would you need Wayland for trackpad gestures?
        
             | 3836293648 wrote:
             | You technicay don't, but the implementations on X are kinda
             | terrible and can't do 1:1
        
       | pornel wrote:
       | Apple could help by documenting this stuff. I remember the good
       | old days when every Mac OS X came with an extra CD with Xcode,
       | and Apple was regularly publishing Technical Notes detailing
       | implementation details. Today the same level of detail is treated
       | as top secret, and it seems that Apple doesn't want developers to
       | even think beyond the surface of the tiny App Store sandbox.
        
         | dagmx wrote:
         | Even back in the day, those technical notes would not cover
         | private APIs like this, because they're subject to change or
         | are for internal use only.
         | 
         | These are the same in any closed source OS
        
         | madeofpalk wrote:
         | Apple Platform Security: May 2022. 242 pages.
         | 
         | https://help.apple.com/pdf/security/en_GB/apple-platform-sec...
        
         | alberth wrote:
         | I really wish Apple would do another "Snow Leopard" - go an
         | entire year WITHOUT any new features and just fix bugs and
         | documentation.
         | 
         | This twitter thread is a perfect example of why it's needed
         | 
         | https://twitter.com/nikitonsky/status/1557357661171204098
        
           | argsnd wrote:
           | I mean that thread is looking at pre-release software
        
             | ianlevesque wrote:
             | There is software that approaches barely functional after
             | dozens of rounds of QA testing, and then there is software
             | that is implemented on a solid foundation with care and
             | happens to have a few bugs. Unfortunately that many bugs in
             | a beta implies the former. I think the thread comes from a
             | disappointment that Apple is moving from the second
             | category to the first.
        
               | buildbot wrote:
               | But it is not even a "Consumer Beta" it is a developer
               | beta- for catching bugs and allowing devs to create
               | applications for new APIs while Apple polishes the build
               | for release? Was snow leopard ever released as a dev beta
               | even?
        
           | DerekL wrote:
           | Snow Leopard had few user-facing features, but it did have
           | new APIs, such as Grand Central Dispatch and OpenCL, and also
           | an optional 64-bit kernel.
           | 
           | https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard
        
             | sudosysgen wrote:
             | OpenCL is not an OS level API. I guarantee you they were
             | basically just redistributing Intel and NVidia
             | implementations. GCD isn't OS level either, it's just a
             | library, but it is at least a new API.
        
           | madeofpalk wrote:
           | Ahh yes, that "just fix bugs" release that would delete your
           | main user account if you used a guest user
           | https://www.engadget.com/2009-10-12-snow-leopard-guest-
           | accou...
           | 
           | Besides, rebuilding/redesigning the settings screen is a
           | perfect "snow leopard" thing. That's not an actual Feature.
           | 
           | The problem isn't doing features, the probably is doing a bad
           | job.
        
       | naillo wrote:
       | This person has an awesome set of blog posts. One of the few rss
       | feeds I keep track of.
        
       | [deleted]
        
       | bob1029 wrote:
       | Clip space is the bane of my existence. I've been building a
       | software rasterizer from scratch and implementing vertex/triangle
       | clipping has turned into one of the hardest aspects. It took me
       | about 50 hours of reading various references before I learned you
       | cannot get away with doing this in screen space or any time after
       | perspective divide.
       | 
       | It still staggers me that there is not 1 coherent reference for
       | how to do all of this. Virtually every reference about clipping
       | winds up with something like "and then the GPU waves its magic
       | wand and everything is properly clipped & interpolated :D". Every
       | paper I read has some "your real answer is in another paper" meme
       | going on. I've got printouts of Blinn & Newell, Sutherland &
       | Hodgman, et. al. littered all over my house right now. About 4
       | decades worth of materials.
       | 
       | Anyone who works on the internals of OGL or the GPU stack itself
       | has the utmost respect from me. I cannot imagine working in that
       | space full-time. About 3 hours of this per weekend is about all
       | my brain can handle.
        
         | joakleaf wrote:
         | Not sure if you got through clipping, but it was one of those
         | things I had to go through first at some point in the mid 90s
         | myself. I feel your pain, but after having implemented it about
         | 5-10 times in various situations, variants and languages, I can
         | promise it gets a lot easier.
         | 
         | In my experience it is most elegant to clip against the 6
         | planes of the view-frustrum in succession (one plane at a
         | time). Preferably clipping against the near-plane first, as
         | that reduces the set of triangles the most for subsequent
         | clips.
         | 
         | Your triangles can turn into convex polygons after a clip. So
         | it is convenient to start with a generic convex polygon vs.
         | plane-clipping algorithm; The thing to be careful about here is
         | that points can (and will) lie on the plane.
         | 
         | Use the plane equation (f(x,y,z)=ax+by+cz+d) to determine if a
         | point is on one side, on the plane, or the other side.
         | 
         | It is convenient to use a "mask" to designate the side a point
         | v=(x,y,z) is on. So: 1 := inside_plane (f(x,y,z)>eps) 2 :=
         | outside_plane (f(x,y,z)<-eps) 3 := on_plane (f(x,y,z)>=eps &&
         | f(x,y,z)<=eps) Let m(v_i) be the mask of v_i.
         | 
         | When you go through each edge of the convex polygon
         | (v_i->v_{i+1}), you can check if you should clip the edge using
         | the mask. I.e.:
         | 
         | if (m(v_i)&m(v_{i+1})==0 the points are on opposite side =>
         | clip [determine intersection point].
         | 
         | Since you are just clipping to the frustrum, just return a list
         | of the points that are inside or on the plane (i.e.
         | m(v_i)&1==1) and the added intersection points.
         | 
         | There are lots of potential for optimization, of course, but I
         | wouldn't worry about that. There are lots of other places to
         | optimize a software rasterizer with more potential, in my
         | experience.
        
         | fabiensanglard wrote:
         | I wrote about this a few years ago
         | (https://fabiensanglard.net/polygon_codec/index.php).
         | 
         | It was a pain to lean indeed and the best resources were quite
         | old:
         | 
         | - "CLIPPING USING HOMOGENEOUS COORDINATES" by James F. Blinn
         | and Martin E. Newell
         | 
         | - A Trip Down the Graphics Pipeline by Jim Blinn (yes the same
         | Blinn that co-authored the paper above).
        
         | Jasper_ wrote:
         | Vertex/triangle clipping is quite rare, and mostly used for
         | clipping against the near plane (hopefully rare in practice).
         | Most other implementations use a guard band as a fast path (aka
         | doing it in screen space) -- real clipping is only used where
         | your guard band doesn't cover you, precision issues mostly.
         | 
         | I'm not sure what issues you're hitting, but I've never found
         | clipping to be that challenging or difficult. Also, clip
         | control and clip space aren't really specifically about
         | clipping -- clip space is just the output space of your vertex
         | shader, and the standard "clip control" extension just controls
         | whether the near plane is at 0 or -1. And 0 is the correct
         | option.
        
           | Sharlin wrote:
           | Guard band clipping is only really applicable to "edge
           | function" type rasterizers. For the classic scanline-based
           | algorithm, sure, you can easily clip to the right and bottom
           | edges of the viewport while rasterizing, but the top and left
           | edges are trickier. Clipping in clip space, before
           | rasterization, is more straightforward, given that you have
           | to frustum cull primitives anyway.
        
           | bob1029 wrote:
           | > clipping against the near plane (hopefully rare in
           | practice)
           | 
           | I am not sure I understand why this would be rare. If I am
           | intending to construct a rasterizer for a first-person
           | shooter, clipping is essentially mandatory for all but the
           | most trivial of camera arrangements.
        
             | Jasper_ wrote:
             | Yes, of course, I was definitely imagining you were
             | struggling to get simpler scenes to work. But also,
             | proportionally few of your triangles in any given scene
             | should be near-plane clipped. It's OK to have a slow path
             | for it, and then speed it up later. I've never felt the
             | math for the adjusted barycentrics is too hard, but it can
             | take a bit to wrap your head around. Good luck :)
        
               | Sharlin wrote:
               | You and the GP have different rasterization algorithms in
               | mind I think. The GP, I presume, is talking about a
               | classic scanline-based rasterizer rather than an edge
               | function "am I inside or not" type rasterizer that GPUs
               | use.
        
             | bpye wrote:
             | Clipping or culling? I expect it's mostly the latter unless
             | your camera ends up intersecting the geometry.
        
               | royjacobs wrote:
               | As an example, if you're writing a shooter then the floor
               | might be a large square that will almost definitely be
               | intersecting the near plane. You absolutely need clipping
               | here.
        
               | bob1029 wrote:
               | This is precisely the first place I realized I needed
               | proper clipping. Wasted many hours trying to hack my way
               | out of doing it the right way.
        
               | bob1029 wrote:
               | Both. You almost always need both.
               | 
               | Clipping deals with geometry that is partially inside the
               | camera. Culling (either for backfaces or entire
               | instances) is a preliminary performance optimization that
               | can be performed in a variety of ways.
        
           | bpye wrote:
           | Even with a guard band don't you need to at least test the
           | polygons for Z clipping prior to the perspective divide?
           | 
           | Clipping in X and Y is simpler at least, and again the guard
           | band hopefully mostly covers you.
        
             | bob1029 wrote:
             | > Even with a guard band don't you need to at least test
             | the polygons for Z clipping prior to the perspective
             | divide?
             | 
             | Yes. Guard band is an optimization that reduces the amount
             | of potential clipping required. You still need to be able
             | to clip for fundamental correctness.
             | 
             | If you totally reject a vertex for a triangle without
             | determining precisely where it intersects the desired
             | planes, you are effectively rejecting the entire triangle
             | and creating yucky visual artifacts.
        
         | nauful wrote:
         | You have to clip against planes in 4D space (xyzw) before
         | perspective divide (xyz /= w), not 3D (xyz).
         | 
         | This simplified sample shows Sutherland-Hodgman with 4D
         | clipping:
         | https://web.archive.org/web/20040713023730/http://wwwx.cs.un...
         | The main difference is the intersect method finds the
         | intersection of a 4D line segment against a 4D plane.
        
         | Sharlin wrote:
         | I also implemented clipping in my software rasterizer a while
         | ago and can definitely sympathize! (Although I've written
         | several simple scanline rasterizers in my life, this was the
         | first time I actually bothered to implement proper clipping. I
         | actually reinvented Sutherland-Hodgman from scratch which was
         | pretty fun.) The problematic part is actually only the near
         | plane due to how projective geometry works. At z=0 there's a
         | discontinuity in real coordinates after z division, which means
         | there can be no edges that cross from negative to positive z. Z
         | division turns such an edge [a0, a1] into an "inverse" edge
         | (-[?], a0'] [?] [a1', [?]) which naturally makes rendering a
         | bit tricky. In projective/homogenous coordinates, however, it
         | is fine, because the space "wraps around" from positive to
         | negative infinity. All the other planes you can clip against in
         | screen space / NDC space if you wish, but I'm not sure there
         | are good reasons to split the job like that.
        
         | samstave wrote:
         | Be the documentation you want to see in the world.
        
           | hashishen wrote:
           | "RTFM" - The manual
        
             | samstave wrote:
             | "Look up error code on stack exchange to find the error in
             | question seeking a solution, but its you from 5 years ago."
        
               | sph wrote:
               | "What was I working on? What did I see?!"
               | 
               | https://xkcd.com/979/
        
       | moondev wrote:
       | Can you run a PCIE enclosure over thunderbolt on asahi Linux yet?
       | Could this enable GPUs that already work on aarch64 Linux?
        
         | hishnash wrote:
         | I would assume all linux GPU drivers would need to be adapted
         | at least a little to support the larger page size (most linux
         | AARCH64 kernel level code is writing assuming 4kb pages).
        
         | kmeisthax wrote:
         | Yes, but NOT for GPUs. Apple Silicon does not support non-
         | Device mappings over Thunderbolt, so eGPUs will never work.
        
       | andrewmcwatters wrote:
       | OpenGL on macOS is so frustrating, that I and many other
       | developers have basically abandoned it, and not in favor of using
       | Metal--the easier alternative is to just no longer support Macs.
       | 
       | Yes, OpenGL on macOS is now implemented over Metal, but
       | unfortunately a side effect of this is that implementation-level
       | details that were critical to debugging and profiling OpenGL just
       | no longer exist for tools to work with. Anything is possible?
       | Maybe? I'm sure Apple Graphics engineers could make old tooling
       | work with the new abstraction layer, but it's not happening.
       | 
       | Tooling investment is all on Metal now. But so much existing NON-
       | LEGACY software relied on OpenGL.
       | 
       | So what do you do? You debug and perf test on Windows and Linux
       | and hope that fixing issues there addresses concerns on macOS,
       | and hopefully your problems aren't platform-specific.
       | 
       | This is how some graphics engineers, including myself, continue
       | to ship for macOS while never touching it.
       | 
       | Edit: Also, Vulkan is a waste of time for anyone who isn't a
       | large studio. No one wants to write this stuff. The most common
       | argument is "You only write it once." No, you don't.
       | 
       | You have to support this stuff. If it were that easy, bgfx would
       | have been written in a month and it would have been considered
       | "done" afterwards.
        
       | [deleted]
        
       | fbanon wrote:
       | Couldn't you just pre-multiply the projection matrix to remap the
       | Z range from [-1,1] to [0,1]?
        
         | NobodyNada wrote:
         | What projection matrix?
         | 
         | Remember that this translation needs to happen at the graphics
         | driver level. For fixed-function OpenGL where the application
         | actually passes the graphics driver a projection matrix this
         | would be doable. But if your application is using a version of
         | OpenGL newer than 2004, the projection matrix is a part of your
         | vertex shader. The graphics driver can't tell what part of your
         | shader deals with projection, and definitely can't tell what
         | uniforms it would need to tweak to modify the projection matrix
         | -- many shaders might not even _have_ a projection matrix.
        
           | fbanon wrote:
           | I know. But the second sentence of the article starts with:
           | 
           | "Neverball uses legacy "fixed function" OpenGL."
           | 
           | But also you could simply remap the Z coordinate of
           | gl_Position at the end of the vertex stage, do the clipping
           | in [0,1] range, then map it back to [-1,1] for gl_FragCoord
           | at the start of the fragment stage.
        
             | NobodyNada wrote:
             | > "Neverball uses legacy "fixed function" OpenGL."
             | 
             | Sure, it'd work for Neverball, but the article is clear
             | that they're looking for a general solution: something
             | that'd work not just for Neverball, but for all OpenGL
             | applications, and would ideally let them give applications
             | control over the clip-control bit through OpenGL/Vulkan
             | extensions.
             | 
             | > But also you could simply remap the Z coordinate of
             | gl_Position at the end of the vertex stage, do the clipping
             | in [0,1] range, then map it back to [-1,1] for gl_FragCoord
             | at the start of the fragment stage.
             | 
             | Yes, that was the current state-of-the-art before this
             | article was written:
             | 
             | > As Metal uses the 0/1 clip space, implementing OpenGL on
             | Metal requires emulating the -1/1 clip space by inserting
             | extra instructions into the vertex shader to transform the
             | Z coordinate. Although this emulation adds overhead, it
             | works for ANGLE's open source implementation of OpenGL ES
             | on Metal.
             | 
             | > Like ANGLE, Apple's OpenGL driver internally translates
             | to Metal. Because Metal uses the 0 to 1 clip space, it
             | should require this emulation code. Curiously, when we
             | disassemble shaders compiled with their OpenGL
             | implementation, we don't see any such emulation. That means
             | Apple's GPU must support -1/1 clip spaces in addition to
             | Metal's preferred 0/1. The problem is figuring out how to
             | use this other clip space.
        
         | Jasper_ wrote:
         | This is effectively what the vertex shader modification would
         | do -- the same trick that ANGLE does: gl_Position.z =
         | (gl_Position.z + gl_Position.w) * 0.5;
         | 
         | This is the same as modifying a projection matrix -- you're
         | doing the same post-multiply to the same column. But note that
         | there's no guarantee there's ever a projection matrix. Clip
         | space coordinates could be generated directly in the vertex
         | shader.
        
       | skocznymroczny wrote:
       | I don't know why there's so much love for OpenGL in the
       | communities still. Maybe it's the "open" part in the name, which
       | was always confusing people, thinking it's an open source
       | standard or something like that.
       | 
       | The API is very antiquated, doesn't match modern GPU
       | architectures at all and requires many workarounds in the driver
       | to get the expected functionality, often coming at a performance
       | cost.
       | 
       | Vulkan is nice, but it goes into the other extreme. It's very low
       | level and designed for advanced users. Even getting anything on
       | the screen in Vulkan is intimidating because you have to write
       | everything from scratch. To go beyond hello world, you even have
       | to write your own memory allocator (or use an existing opensource
       | one) because you can only do a limited amount of memory
       | allocations and you're expected to allocate a huge block of
       | memory and suballocate it as needed by your application.
       | 
       | In comparison, DX12 is a bit easier to grasp. It has some nice
       | abstractions such as commited resources, which take some of the
       | pain away.
       | 
       | Personally I like Metal as an API. It is lower level than OpenGL,
       | getting rid of most nasty OpenGL things (state machine, lack of
       | pipeline state objects), yet it is very approachable and easy to
       | transition to from DX11/OpenGL. I was happy when I saw WebGPU was
       | based on Metal at first. WebGPU is my go-to 3D API at the moment,
       | especially with projects like wgpu-native which make it usable on
       | native platforms too (don't let the Web in WebGPU confuse you).
        
         | Teknoman117 wrote:
         | > Vulkan is nice, but it goes into the other extreme. It's very
         | low level and designed for advanced users. Even getting
         | anything on the screen in Vulkan is intimidating because you
         | have to write everything from scratch.
         | 
         | I honestly believe that this is the major reason. Developing a
         | hobby project with OpenGL is little more than using SDL or GLFW
         | to get a window with a GLContext and then you can just start
         | calling commands. Vulkan is much more complicated and unless
         | you're really pushing performance limits, you're not getting
         | much of a benefit for the extra headache.
        
           | gary_0 wrote:
           | OpenGL is what you use if you just want to render some
           | triangles on the GPU with a minimum of hassle on the most
           | platforms (which is quite a few if you include GLES, WebGL,
           | and ANGLE). Most people aren't writing graphics engines for
           | AAA games so OpenGL is all they need.
        
         | mort96 wrote:
         | You acknowledge that Vulkan is too low level for people who
         | aren't investing billions into an AAA graphics engine. And you
         | surely know that OpenGL and Vulkan are the only two cross-
         | platform graphics APIs. Are you sure you can't infer why people
         | like OpenGL from those two points? Especially in Linux-heavy
         | communities where DX and Metal aren't even options?
         | 
         | I assure you, none of the "love" for OpenGL comes from the
         | elegance of its design.
        
           | bitwize wrote:
           | There should be more effort to support Direct3D under Linux.
           | We have Wine and DXVK, but it should be easier to integrate
           | the D3D support into Linux applications.
        
       | skrrtww wrote:
       | Despite the progress here, for me it raises a question: Most of
       | the old games she mentions are x86 32bit games. What's the story
       | for how these programs are actually going to run in Asahi? Box86
       | [1] doesn't sound like it's projected to run on M1. Rosetta 2 on
       | macOS allows 32-bit code to be run by a 64-bit process, which is
       | the workaround CrossOver et. al. use (from what I understand),
       | but that obviously won't be available?
       | 
       | [1] https://box86.org
        
         | TazeTSchnitzel wrote:
         | QEMU has a "user mode" feature where it can transparently
         | emulate a Linux process and translates syscalls. You can
         | probably run at least old 32-bit Linux games that way, assuming
         | you have appropriate userland libraries available. Windows
         | content might be trickier.
        
         | rowanG077 wrote:
         | Rosetta 2 runs on Linux. There's also FEX.
        
           | amluto wrote:
           | Does it? Or does Rosetta 2 run on Mac OS with a Linux shim to
           | ask the host to kindly Rosetta-ify a given binary?
        
           | skrrtww wrote:
           | I guess that's true, I forgot about Apple making Rosetta 2
           | installable in Linux VMs.
           | 
           | Also though, since Rosetta 2 was released, it's had an
           | incredibly slow implementation of x87 FPU operations, and
           | anything that relies on x87 floating point math (including
           | lots of games) is currently running about 100x slower than it
           | ought to. Apple is aware of it but it's still not fixed in
           | Ventura.
           | 
           | I hadn't heard of FEX before, looks interesting.
        
             | mort96 wrote:
             | Huh, I thought everyone used SSE floats these days. I
             | suppose there may be old games compiled with x87 floats,
             | but I'd expect those to be made for CPUs so old that even
             | slow x87 emulation wouldn't be a big issue.
             | 
             | What software do people have x87-related issues with?
        
               | skrrtww wrote:
               | The software I personally have the most issues with is
               | Star Wars Episode 1: Racer, a 3d title from 1999 that
               | from what I understand uses x87 math extensively. In
               | Parallels (i.e. no Rosetta) it runs at 120fps easily,
               | while in CrossOver the frame rate barely ekes above 20.
               | Old titles like Half-Life, all other Source games,
               | Fallout 3, SWTOR etc. all run vastly worse than they
               | should, and many cannot run at playable framerates
               | through Rosetta. Honestly, the problem most likely
               | extends to more of Rosetta's floating point math than
               | just x87.
               | 
               | The author of REAPER has also written about it some:
               | https://user.cockos.com/~deadbeef/index.php?article=842
               | 
               | There's been lots of discussion about the issue in the
               | Codeweavers forums, and Codeweavers points the blame
               | squarely at Apple, who have been, predictably, very quiet
               | about it.
        
           | 58028641 wrote:
           | Does Rosetta on Linux support 32 bit code? I believe FEX
           | does.
        
             | saagarjha wrote:
             | Rosetta supports emulating 32-bit code.
        
               | 58028641 wrote:
               | On Linux? I know it has been confirmed on macOS. I
               | haven't heard anyone say they ran 32 bit code on Linux.
        
         | mort96 wrote:
         | Someone would need to make an x86 -> ARM recompiler like
         | Rosetta 2. That's not an easy task, but also not the task she's
         | tackling with the GPU driver.
         | 
         | It's not unprecedented in the open-source space though; the
         | PCSX2 PlayStation 2 emulator for example contains a MIPS -> x86
         | recompiler, and the RPCS3 PlayStation 3 emulator contains a
         | Cell -> x86 recompiler.
        
       | viktorcode wrote:
       | Can someone explain to me why support OpenGL at all? Vulkan is
       | easier to implement. Is there a need for OpenGL on Linux?
        
         | dagmx wrote:
         | Because Vulkan, despite the mystical reputation it has in
         | gaming circles, actually has fairly low adoption vs OpenGL .
         | 
         | Very few applications in the grand scheme of things use Vulkan,
         | and a minority of games do.
         | 
         | Therefore the ROI on supporting OpenGL is very high.
        
           | 58028641 wrote:
           | Doesn't implementing Vulkan give you DirectX with DXVK and
           | VKD3D and OpenGL with Zink for free?
        
             | Cu3PO42 wrote:
             | Only if you support all of the necessary Vulkan features
             | and extensions. The article states that getting to that
             | point would be a multi-year full time effort, whereas
             | "only" OpenGL seems to be within grasp for this year. And
             | arguably having a lower OpenGL standard soon is better than
             | OpenGL 4.6 in a few years.
        
             | erichocean wrote:
             | Yes, with appropriate (and reasonably-available) Vulkan
             | extensions.
        
         | phire wrote:
         | Keep in mind that Mesa actually implements most of OpenGL for
         | you. Its not like you are implementing a whole OpenGL driver
         | from scratch, you are mostly implementing a hardware
         | abstraction layer.
         | 
         | My understanding is that this hardware abstraction layer for
         | mesa is way easier to implement than a full vulkan driver,
         | especially since the earlier versions of OpenGL only require a
         | small subset of the features that a vulkan driver requires.
        
         | Jasper_ wrote:
         | Because of how mesa is structured. OpenGL is notoriously
         | terrible to implement, so there's a whole framework called
         | Gallium that does the hard work for you, and you slot yourself
         | into that. Meanwhile, Vulkan is easier to implement from
         | scratch, so there's a lot less infrastructure for it in mesa,
         | and you have to implement more of the boring paperwork
         | correctly.
         | 
         | It's an accident of history more than anything else. Once the
         | reverse engineering is further along, I expect a Vulkan driver
         | to be written for it, and the Gallium one to be phased out in
         | favor of Zink.
        
         | gjsman-1000 wrote:
         | On a reverse-engineered GPU like this, because of Vulkan's low-
         | level design, implementing (early) OpenGL might actually be
         | significantly easier.
         | 
         | Also, Vulkan isn't popular with game developers because
         | availability sucks. Vulkan doesn't run on macOS. Or iOS. Or 40%
         | of Android phones. Or Xbox. Or PlayStation. Or Nintendo
         | Switch[1].
         | 
         | Unless you are targeting Windows (which has DirectX and OpenGL
         | already), or those 60% of Android phones only, or Linux, why
         | would you use Vulkan? On Windows, DirectX is a generally-
         | superior alternative, and you get Xbox support basically free,
         | and if you also support an older DirectX, much broader PC
         | compatibility. On Android, just use OpenGL, and don't worry
         | about separate implementations for the bifurcated Vulkan/OpenGL
         | support. On Linux, just use Proton with an older DirectX. Whiz
         | bang, no need for Vulkan whatsoever. Yes, some systems might
         | perform better if you had a Vulkan over OpenGL, but is the cost
         | worth it when you don't need it?
         | 
         | [1] Technically, Vulkan does exist for Nintendo Switch, but it
         | is so slow almost no production game uses it, and it is widely
         | considered not an option. Nintendo Switch is slow enough
         | without Vulkan making it slower. Much easier just to use the
         | proprietary NVIDIA library.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-08-22 23:00 UTC)