[HN Gopher] Clip control on the Apple GPU ___________________________________________________________________ Clip control on the Apple GPU Author : stefan_ Score : 207 points Date : 2022-08-22 14:04 UTC (8 hours ago) (HTM) web link (rosenzweig.io) (TXT) w3m dump (rosenzweig.io) | bla3 wrote: | > Here's a little secret: there are two graphics APIs called | "Metal". There's the Metal you know, a limited API that Apple | documents for App Store developers, an API that lacks useful | features supported by OpenGL and Vulkan. | | > And there's the Metal that Apple uses themselves, an internal | API adding back features that Apple doesn't want you using. | | Apple does stuff like this so much and gets so little flak for | it. | | I use macOS since it seems like the least bad option of you want | a Unix but also don't want to spend a lot of time on system | management, but this is a real turn-off. | bri3d wrote: | > Apple does stuff like this so much and gets so little flak | for it. | | Why should they get flak for having internal APIs? The fact | that the internal API is a superset of the external API is | smart engineering. | | Think about it this way: Apple could just as well have made the | "Metal that Apple uses themselves" some arcane "foocode" IR | language or something, as I'm sure many shader compilers and | OpenGL runtime implementations do, and nobody would be nearly | as mad about it. | | The fact that they use internal APIs for external apps in their | weird iOS walled garden is obnoxious, but having private, | undocumented APIs in a closed-source driver is not exactly an | Apple anomaly. | LeifCarrotson wrote: | > Why should they get flak for having internal APIs? The fact | that the internal API is a superset of the external API is | smart engineering. | | It's not about having good segmentation of user-facing and | kernel-side libraries, no one faults them for that. | | It's about Apple building user-facing apps that use the whole | API, and then demanding that other developers not use the | features required to implement those apps because we're not | trusted to maintain the look-and-feel, responsiveness, or | battery life expectations of apps on the platform. | dcx wrote: | But isn't it kind of fair to say that when you look at the | case studies presented by (a) the Android app store in the | past decade and (b) Windows malware in the decade before | that, this trust has in fact not been earned? | | I hate a walled garden as much as the next developer, and | the median HN reader is probably more than trustworthy. But | past performance does predict future performance. | fezfight wrote: | If you buy a hackintosh, you have to sometimes mess around to | get stuff to work. Same goes for Linux on random hardware. If | you check first and buy a machine that supports the OS you're | using, you don't have to do anything special. It'll work as you | expect. | | It's freeing not to be beholden to the likes of someone like | Tim Cook who, it would seem, spends the majority of his waking | hours figuring out how to hide anticonsumer decisions under | rugs. | gjsman-1000 wrote: | > Apple does stuff like this so much and gets so little flak | for it. | | To be fair, Windows has a _ludicrous_ amount of undocumented | APIs for internal affairs as well, and you can get deep into | the weeds very quickly, just ask the WINE Developers who have | to reverse-engineer the havoc. There is no OS without Private | APIs, but Windows is arguably the worst with more Private or | Undocumented APIs than Apple. | | This actually bears parallels to Metal. Until DirectX 12, | Windows had no official way to get low-level. Vulkan and OpenGL | are only 3rd-party supported, not Microsoft-supported, | Microsoft officially only supports DirectX. If you want | Vulkan/OpenGL, that's on your GPU vendor. If you wanted low- | level until 12, you _may_ have found yourself pulling some | undocumented shenanigans. Apple hasn 't gotten to their DirectX | 12 yet, but they'll get there eventually. | | As for why they are Private, there could be many reasons, not | least of which that (in this case) Apple has a very complicated | Display Controller design and is frequently changing those | internal methods, which would break compatibility if third- | party applications used them. Just ask Asahi about how the DCP | changed considerably from 11.x to 13.x. | chongli wrote: | _Apple has a very complicated Display Controller design_ | | Can anyone in the know give more information here? Why would | Apple want to do this? What could they be doing that's so | complicated in the display controller? | gjsman-1000 wrote: | https://twitter.com/marcan42/status/1549672494210113536 | | and | | https://twitter.com/marcan42/status/1415360411260493826?lan | g... | | and | | https://twitter.com/marcan42/status/1526104383519350785 | | As to why? Well, if it ain't broke don't fix it from | iPhone, but it is still a bit of a mystery. | | In a nutshell from those threads: | | 1. Apple's DCP silicon layout is actually massive, | explaining the 1 external display limit | | 2. Apple implements half the DCP firmware on the main CPU | and the other half on the coprocessor with RPC calls, which | is hilariously complicated. | | 3. Apple's DCP firmware is versioned, with a different | version for every macOS release. This is also why Asahi | Linux currently uses a "macOS 12.3" shim, so they can focus | on the macOS 12.3 DCP firmware in the driver, which will | probably not work with the macOS 12.4+ DCP firmware or the | macOS 12.2- firmware. | | I can totally see why Apple doesn't want people using their | low-level Metal implementation that deals with the mess | yet. | chongli wrote: | Yeah it makes perfect sense that they don't want to | expose any of that complexity to 3rd parties and risk | constant breakage with new models. I'm just really | curious about what sort of complex logic they have going | on in that silicon. | phire wrote: | The complexity with the firmware split across the main | CPU and a coprocessor seems to be a historical artefact. | | Seems the DCP driver was originally all on the main CPU, | and when apple got these cheap coprocessor cores, they | took a lazy approach of just inserting a simple RPC layer | in the middle. The complexity for Asahi comes from the | fact that it's a c++ API that can change very dynamically | from version to version. | | And yes, these ARM coprocessor cores are cheap, apple | have put at least 16 of them [1] on the M1, on top the 4 | performance and 4 efficiency cores. They are an apple | custom design that implement only the 64bit parts of the | ARMv8 spec. I'm not entirely sure why the actual DCP is | so big, but it's not because of the complex firmware. | Potentially because the DCP includes enough dedicated RAM | to store an entire framebuffer on-chip. | | If so, they will be doing this because it allows for | lower power consumption. The main DRAM could be put in a | power-saving mode and kept there for seconds or even | minutes at a time without having to wake it up multiple | times per frame, even when just showing a static image. | | [1] | https://twitter.com/marcan42/status/1557242428876537856 | throwaway08642 wrote: | @marcan42 said that on the M1 MacBook Pro models, the DCP | also implements hardware-level antialiasing for the notch | and rounded display corners. | Pulcinella wrote: | _Apple does stuff like this so much and gets so little flak for | it._ | | It would be one thing if the private APIs were limited to | system frameworks and features while Apple's own apps weren't | allowed to use them, but they do. E.g. The Swift Playgrounds | app for iPad is allowed to share and compile code, run separate | processes, etc. which isn't normally allowed in the AppStore. | They also use blur and other graphical effects (outside of the | background blur material and the SwiftUI blur modifier) that | are unavailable outside of private APIs. | | It stinks because of the perceived hypocrisy and the inability | to compete on a level playing field or leave the AppStore (and | I say this as someone who normally doesn't mind the walled | garden!) | adrian_b wrote: | Unfortunately such a behavior is not at all new. | | The best known example of these methods is how Microsoft has | exploited the replacement of MS-DOS with Windows 3.0 and | especially with Windows 95. | | During the MS-DOS years, the only Microsoft software products | that were successful were their software development tools, | i.e. compilers and interpreters, and even those had strong | competition, mainly from Borland. Those MS products addressed | only a small market and they could not provide large | revenues. The most successful software products for MS-DOS | were from many other companies. | | That changed abruptly with the transition to various Windows | versions, when the Microsoft developers started to have a | huge advantage over those from any other company, both by | being able to use undocumented internal APIs provided by the | MS operating systems and also by knowing in advance the | future documented APIs, before they were revealed to | competitors. | | Thus in a few years MS Office has transitioned from an | irrelevant product, much inferior to the competition, to the | dominant suite of office programs, which has eliminated all | competitors and which has become the main source of revenue | for MS. | [deleted] | [deleted] | Jasper_ wrote: | As a graphics engineer, good riddens to the old clip space, | 0...1 really is the correct option. We also don't know what | else "OpenGL mode" enables, and the details of what it does | probably changes between GPU revisions -- the emulation stack | probably has the details, and changes its own behavior of | what's in hardware and what's emulated in the OpenGL stack | depending on the GPU revision. | | Also, to Alyssa, if she's reading this: you're just going to | have to implement support shader variants. Build your | infrastructure for supporting them now. It's going to be far | more helpful than just for clip control. | | But yes, the Vulkan extension was just poorly specified, | allowing you to change clip spaces between draws in the same | render pass is, again, ludicrous, and the extension should just | be renamed VK_EXT_i_hate_tilers (like so many others of their | kind). Every app is going to set it at app init and forget it; | the implementation using the render pass bit and flushing on | change will cover the 100% case, and won't be slow at all. | garaetjjte wrote: | >good riddens to the old clip space, 0...1 really is the | correct option | | More like 1...0, which nicely improves depth precision. | Annoyingly due to symmetric -1...1 range reverse-Z cannot be | used on OpenGL out of the box, but it can be fixed with | ARB_clip_control. https://developer.nvidia.com/content/depth- | precision-visuali... | bpye wrote: | > you're just going to have to implement support shader | variants | | I admittedly have zero experience with Mesa, but it seems | like shader variants is something that should be common | infrastructure? Though of course the reason that a variant is | needed would be architecture specific. | chaxor wrote: | This is why the Asahi Linux project is so exciting!! You get | the great performance at low-power (M* ARM processors) while | still getting the more performant and useful Linux experience. | | I am really thankful to the Asahi Linux team, and specifically | in this instance for the GPU, [Alyssa | Rosenweig](https://github.com/alyssarosenzweig), [Asahi | Lina](https://github.com/asahilina), and [Doug all | Johnson](https://github.com/dougallj). | rowanG077 wrote: | Amazing that this works because of the herculean effort of just a | handful of people. | hrydgard wrote: | There is no good reason to flip the flag dynamically at runtime | and apps just don't do that, so flushing the pipeline should be | perfectly fine, even in an implementation of the clip control | extension. | gjsman-1000 wrote: | Optimistic that OpenGL 2.1 will be available by the end of the | year on Asahi - well that is news. It's only 2.1, but that's | enough (as stated) for a web browser, desktop acceleration, and | old games. | | Also RIP all the countless pessimistic "engineers" here and | elsewhere saying we'd be waiting for years more for _any_ | graphics acceleration. | | Edit: It is true though that AAA Gaming will wait: "Please temper | your expectations: even with hardware documentation, an optimized | Vulkan driver stack (with enough features to layer OpenGL 4.6 | with Zink) requires over many years of full time work. At least | for now, nobody is working on this driver full time. Reverse- | engineering slows the process considerably. We won't be playing | AAA games any time soon." | | Still, even if that be the case, accelerated desktop is an | accelerated desktop, much sooner than many expected. | smoldesu wrote: | It's pretty insane that OpenGL 2.1 is even functional on a GPU | this strange, but remember; this is still an unfinished, hacky | implementation (the author's own concession). Plus, you're | going to be stuck on x11 until any serious GPU drivers get | written, which in many people's opinion is just as bad as no | hardware acceleration at all. No MacOS-like trackpad gestures | either, you'll be waiting for Wayland support to get that too. | It'll definitely be a boon for web browsing though, so I won't | deny that. What I'm _really_ curious about is older WINE titles | with Box86, if you could get DOS titles like Diablo 2 running | smoothly, it could probably replace my Switch as a portable | emulation machine... | gjsman-1000 wrote: | > pretty insane that OpenGL 2.1 is even functional on a GPU | this strange, | | Well... you were one of the most vocal critics saying it | wouldn't happen anytime soon. | | > unfinished, hacky implementation (the author's own | concession) | | Still more stable than Intel's official Arc drivers, so who | defines "hacky"? ;) | | > Plus, you're going to be stuck on x11 until any serious GPU | drivers get written | | Only because it is running on macOS, which supports X11 but | not Wayland. On Linux, Wayland or X11 will both work, no | problem. | | > No MacOS-like trackpad gestures either, you'll be waiting | for Wayland support to get that too | | Again, Wayland will work on Day 1, it's just a limitation of | running the driver on macOS until the kernel support is | ready. When it is on Linux, Wayland will be a full-go. | smoldesu wrote: | > Well... you were one of the most vocal critics saying it | wouldn't happen anytime soon. | | Yep. Been beating that drum since 2020, looks history | proved me right on this one. | | > Still more stable than Intel's official Arc drivers, so | who defines "hacky"? ;) | | Apparently not me, I had no idea that the M1 supported | Vulkan and DirectX 12. | viraptor wrote: | > if you could get DOS titles like Diablo 2 | | Did you mean some other title? Even diablo 1 was a Windows | game. | [deleted] | kirbyfan64sos wrote: | I'm a bit confused as to why X11/wayland would be a huge | issue here? The Mesa docs do say X11-only, but they're | referring to running the driver on macOS (hence the XQuartz | reference), where Wayland basically doesn't exist. | smoldesu wrote: | Ah, looks like I definitely missed that. | | In any case, I don't think Asahi/M1 has proper KWin or | Mutter support yet. It's still going to take a while before | you get a truly smooth desktop Linux experience on those | devices, but some hardware acceleration is definitely | better than none! | rowanG077 wrote: | I mean the signs were clear for basically one and a half | years now. It was never a question of if. But a question of | when. There were just so many voices that didn't know what | they were talking about. Comparing it to nouveau for example. | Miraste wrote: | Why would you need Wayland for trackpad gestures? | 3836293648 wrote: | You technicay don't, but the implementations on X are kinda | terrible and can't do 1:1 | pornel wrote: | Apple could help by documenting this stuff. I remember the good | old days when every Mac OS X came with an extra CD with Xcode, | and Apple was regularly publishing Technical Notes detailing | implementation details. Today the same level of detail is treated | as top secret, and it seems that Apple doesn't want developers to | even think beyond the surface of the tiny App Store sandbox. | dagmx wrote: | Even back in the day, those technical notes would not cover | private APIs like this, because they're subject to change or | are for internal use only. | | These are the same in any closed source OS | madeofpalk wrote: | Apple Platform Security: May 2022. 242 pages. | | https://help.apple.com/pdf/security/en_GB/apple-platform-sec... | alberth wrote: | I really wish Apple would do another "Snow Leopard" - go an | entire year WITHOUT any new features and just fix bugs and | documentation. | | This twitter thread is a perfect example of why it's needed | | https://twitter.com/nikitonsky/status/1557357661171204098 | argsnd wrote: | I mean that thread is looking at pre-release software | ianlevesque wrote: | There is software that approaches barely functional after | dozens of rounds of QA testing, and then there is software | that is implemented on a solid foundation with care and | happens to have a few bugs. Unfortunately that many bugs in | a beta implies the former. I think the thread comes from a | disappointment that Apple is moving from the second | category to the first. | buildbot wrote: | But it is not even a "Consumer Beta" it is a developer | beta- for catching bugs and allowing devs to create | applications for new APIs while Apple polishes the build | for release? Was snow leopard ever released as a dev beta | even? | DerekL wrote: | Snow Leopard had few user-facing features, but it did have | new APIs, such as Grand Central Dispatch and OpenCL, and also | an optional 64-bit kernel. | | https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard | sudosysgen wrote: | OpenCL is not an OS level API. I guarantee you they were | basically just redistributing Intel and NVidia | implementations. GCD isn't OS level either, it's just a | library, but it is at least a new API. | madeofpalk wrote: | Ahh yes, that "just fix bugs" release that would delete your | main user account if you used a guest user | https://www.engadget.com/2009-10-12-snow-leopard-guest- | accou... | | Besides, rebuilding/redesigning the settings screen is a | perfect "snow leopard" thing. That's not an actual Feature. | | The problem isn't doing features, the probably is doing a bad | job. | naillo wrote: | This person has an awesome set of blog posts. One of the few rss | feeds I keep track of. | [deleted] | bob1029 wrote: | Clip space is the bane of my existence. I've been building a | software rasterizer from scratch and implementing vertex/triangle | clipping has turned into one of the hardest aspects. It took me | about 50 hours of reading various references before I learned you | cannot get away with doing this in screen space or any time after | perspective divide. | | It still staggers me that there is not 1 coherent reference for | how to do all of this. Virtually every reference about clipping | winds up with something like "and then the GPU waves its magic | wand and everything is properly clipped & interpolated :D". Every | paper I read has some "your real answer is in another paper" meme | going on. I've got printouts of Blinn & Newell, Sutherland & | Hodgman, et. al. littered all over my house right now. About 4 | decades worth of materials. | | Anyone who works on the internals of OGL or the GPU stack itself | has the utmost respect from me. I cannot imagine working in that | space full-time. About 3 hours of this per weekend is about all | my brain can handle. | joakleaf wrote: | Not sure if you got through clipping, but it was one of those | things I had to go through first at some point in the mid 90s | myself. I feel your pain, but after having implemented it about | 5-10 times in various situations, variants and languages, I can | promise it gets a lot easier. | | In my experience it is most elegant to clip against the 6 | planes of the view-frustrum in succession (one plane at a | time). Preferably clipping against the near-plane first, as | that reduces the set of triangles the most for subsequent | clips. | | Your triangles can turn into convex polygons after a clip. So | it is convenient to start with a generic convex polygon vs. | plane-clipping algorithm; The thing to be careful about here is | that points can (and will) lie on the plane. | | Use the plane equation (f(x,y,z)=ax+by+cz+d) to determine if a | point is on one side, on the plane, or the other side. | | It is convenient to use a "mask" to designate the side a point | v=(x,y,z) is on. So: 1 := inside_plane (f(x,y,z)>eps) 2 := | outside_plane (f(x,y,z)<-eps) 3 := on_plane (f(x,y,z)>=eps && | f(x,y,z)<=eps) Let m(v_i) be the mask of v_i. | | When you go through each edge of the convex polygon | (v_i->v_{i+1}), you can check if you should clip the edge using | the mask. I.e.: | | if (m(v_i)&m(v_{i+1})==0 the points are on opposite side => | clip [determine intersection point]. | | Since you are just clipping to the frustrum, just return a list | of the points that are inside or on the plane (i.e. | m(v_i)&1==1) and the added intersection points. | | There are lots of potential for optimization, of course, but I | wouldn't worry about that. There are lots of other places to | optimize a software rasterizer with more potential, in my | experience. | fabiensanglard wrote: | I wrote about this a few years ago | (https://fabiensanglard.net/polygon_codec/index.php). | | It was a pain to lean indeed and the best resources were quite | old: | | - "CLIPPING USING HOMOGENEOUS COORDINATES" by James F. Blinn | and Martin E. Newell | | - A Trip Down the Graphics Pipeline by Jim Blinn (yes the same | Blinn that co-authored the paper above). | Jasper_ wrote: | Vertex/triangle clipping is quite rare, and mostly used for | clipping against the near plane (hopefully rare in practice). | Most other implementations use a guard band as a fast path (aka | doing it in screen space) -- real clipping is only used where | your guard band doesn't cover you, precision issues mostly. | | I'm not sure what issues you're hitting, but I've never found | clipping to be that challenging or difficult. Also, clip | control and clip space aren't really specifically about | clipping -- clip space is just the output space of your vertex | shader, and the standard "clip control" extension just controls | whether the near plane is at 0 or -1. And 0 is the correct | option. | Sharlin wrote: | Guard band clipping is only really applicable to "edge | function" type rasterizers. For the classic scanline-based | algorithm, sure, you can easily clip to the right and bottom | edges of the viewport while rasterizing, but the top and left | edges are trickier. Clipping in clip space, before | rasterization, is more straightforward, given that you have | to frustum cull primitives anyway. | bob1029 wrote: | > clipping against the near plane (hopefully rare in | practice) | | I am not sure I understand why this would be rare. If I am | intending to construct a rasterizer for a first-person | shooter, clipping is essentially mandatory for all but the | most trivial of camera arrangements. | Jasper_ wrote: | Yes, of course, I was definitely imagining you were | struggling to get simpler scenes to work. But also, | proportionally few of your triangles in any given scene | should be near-plane clipped. It's OK to have a slow path | for it, and then speed it up later. I've never felt the | math for the adjusted barycentrics is too hard, but it can | take a bit to wrap your head around. Good luck :) | Sharlin wrote: | You and the GP have different rasterization algorithms in | mind I think. The GP, I presume, is talking about a | classic scanline-based rasterizer rather than an edge | function "am I inside or not" type rasterizer that GPUs | use. | bpye wrote: | Clipping or culling? I expect it's mostly the latter unless | your camera ends up intersecting the geometry. | royjacobs wrote: | As an example, if you're writing a shooter then the floor | might be a large square that will almost definitely be | intersecting the near plane. You absolutely need clipping | here. | bob1029 wrote: | This is precisely the first place I realized I needed | proper clipping. Wasted many hours trying to hack my way | out of doing it the right way. | bob1029 wrote: | Both. You almost always need both. | | Clipping deals with geometry that is partially inside the | camera. Culling (either for backfaces or entire | instances) is a preliminary performance optimization that | can be performed in a variety of ways. | bpye wrote: | Even with a guard band don't you need to at least test the | polygons for Z clipping prior to the perspective divide? | | Clipping in X and Y is simpler at least, and again the guard | band hopefully mostly covers you. | bob1029 wrote: | > Even with a guard band don't you need to at least test | the polygons for Z clipping prior to the perspective | divide? | | Yes. Guard band is an optimization that reduces the amount | of potential clipping required. You still need to be able | to clip for fundamental correctness. | | If you totally reject a vertex for a triangle without | determining precisely where it intersects the desired | planes, you are effectively rejecting the entire triangle | and creating yucky visual artifacts. | nauful wrote: | You have to clip against planes in 4D space (xyzw) before | perspective divide (xyz /= w), not 3D (xyz). | | This simplified sample shows Sutherland-Hodgman with 4D | clipping: | https://web.archive.org/web/20040713023730/http://wwwx.cs.un... | The main difference is the intersect method finds the | intersection of a 4D line segment against a 4D plane. | Sharlin wrote: | I also implemented clipping in my software rasterizer a while | ago and can definitely sympathize! (Although I've written | several simple scanline rasterizers in my life, this was the | first time I actually bothered to implement proper clipping. I | actually reinvented Sutherland-Hodgman from scratch which was | pretty fun.) The problematic part is actually only the near | plane due to how projective geometry works. At z=0 there's a | discontinuity in real coordinates after z division, which means | there can be no edges that cross from negative to positive z. Z | division turns such an edge [a0, a1] into an "inverse" edge | (-[?], a0'] [?] [a1', [?]) which naturally makes rendering a | bit tricky. In projective/homogenous coordinates, however, it | is fine, because the space "wraps around" from positive to | negative infinity. All the other planes you can clip against in | screen space / NDC space if you wish, but I'm not sure there | are good reasons to split the job like that. | samstave wrote: | Be the documentation you want to see in the world. | hashishen wrote: | "RTFM" - The manual | samstave wrote: | "Look up error code on stack exchange to find the error in | question seeking a solution, but its you from 5 years ago." | sph wrote: | "What was I working on? What did I see?!" | | https://xkcd.com/979/ | moondev wrote: | Can you run a PCIE enclosure over thunderbolt on asahi Linux yet? | Could this enable GPUs that already work on aarch64 Linux? | hishnash wrote: | I would assume all linux GPU drivers would need to be adapted | at least a little to support the larger page size (most linux | AARCH64 kernel level code is writing assuming 4kb pages). | kmeisthax wrote: | Yes, but NOT for GPUs. Apple Silicon does not support non- | Device mappings over Thunderbolt, so eGPUs will never work. | andrewmcwatters wrote: | OpenGL on macOS is so frustrating, that I and many other | developers have basically abandoned it, and not in favor of using | Metal--the easier alternative is to just no longer support Macs. | | Yes, OpenGL on macOS is now implemented over Metal, but | unfortunately a side effect of this is that implementation-level | details that were critical to debugging and profiling OpenGL just | no longer exist for tools to work with. Anything is possible? | Maybe? I'm sure Apple Graphics engineers could make old tooling | work with the new abstraction layer, but it's not happening. | | Tooling investment is all on Metal now. But so much existing NON- | LEGACY software relied on OpenGL. | | So what do you do? You debug and perf test on Windows and Linux | and hope that fixing issues there addresses concerns on macOS, | and hopefully your problems aren't platform-specific. | | This is how some graphics engineers, including myself, continue | to ship for macOS while never touching it. | | Edit: Also, Vulkan is a waste of time for anyone who isn't a | large studio. No one wants to write this stuff. The most common | argument is "You only write it once." No, you don't. | | You have to support this stuff. If it were that easy, bgfx would | have been written in a month and it would have been considered | "done" afterwards. | [deleted] | fbanon wrote: | Couldn't you just pre-multiply the projection matrix to remap the | Z range from [-1,1] to [0,1]? | NobodyNada wrote: | What projection matrix? | | Remember that this translation needs to happen at the graphics | driver level. For fixed-function OpenGL where the application | actually passes the graphics driver a projection matrix this | would be doable. But if your application is using a version of | OpenGL newer than 2004, the projection matrix is a part of your | vertex shader. The graphics driver can't tell what part of your | shader deals with projection, and definitely can't tell what | uniforms it would need to tweak to modify the projection matrix | -- many shaders might not even _have_ a projection matrix. | fbanon wrote: | I know. But the second sentence of the article starts with: | | "Neverball uses legacy "fixed function" OpenGL." | | But also you could simply remap the Z coordinate of | gl_Position at the end of the vertex stage, do the clipping | in [0,1] range, then map it back to [-1,1] for gl_FragCoord | at the start of the fragment stage. | NobodyNada wrote: | > "Neverball uses legacy "fixed function" OpenGL." | | Sure, it'd work for Neverball, but the article is clear | that they're looking for a general solution: something | that'd work not just for Neverball, but for all OpenGL | applications, and would ideally let them give applications | control over the clip-control bit through OpenGL/Vulkan | extensions. | | > But also you could simply remap the Z coordinate of | gl_Position at the end of the vertex stage, do the clipping | in [0,1] range, then map it back to [-1,1] for gl_FragCoord | at the start of the fragment stage. | | Yes, that was the current state-of-the-art before this | article was written: | | > As Metal uses the 0/1 clip space, implementing OpenGL on | Metal requires emulating the -1/1 clip space by inserting | extra instructions into the vertex shader to transform the | Z coordinate. Although this emulation adds overhead, it | works for ANGLE's open source implementation of OpenGL ES | on Metal. | | > Like ANGLE, Apple's OpenGL driver internally translates | to Metal. Because Metal uses the 0 to 1 clip space, it | should require this emulation code. Curiously, when we | disassemble shaders compiled with their OpenGL | implementation, we don't see any such emulation. That means | Apple's GPU must support -1/1 clip spaces in addition to | Metal's preferred 0/1. The problem is figuring out how to | use this other clip space. | Jasper_ wrote: | This is effectively what the vertex shader modification would | do -- the same trick that ANGLE does: gl_Position.z = | (gl_Position.z + gl_Position.w) * 0.5; | | This is the same as modifying a projection matrix -- you're | doing the same post-multiply to the same column. But note that | there's no guarantee there's ever a projection matrix. Clip | space coordinates could be generated directly in the vertex | shader. | skocznymroczny wrote: | I don't know why there's so much love for OpenGL in the | communities still. Maybe it's the "open" part in the name, which | was always confusing people, thinking it's an open source | standard or something like that. | | The API is very antiquated, doesn't match modern GPU | architectures at all and requires many workarounds in the driver | to get the expected functionality, often coming at a performance | cost. | | Vulkan is nice, but it goes into the other extreme. It's very low | level and designed for advanced users. Even getting anything on | the screen in Vulkan is intimidating because you have to write | everything from scratch. To go beyond hello world, you even have | to write your own memory allocator (or use an existing opensource | one) because you can only do a limited amount of memory | allocations and you're expected to allocate a huge block of | memory and suballocate it as needed by your application. | | In comparison, DX12 is a bit easier to grasp. It has some nice | abstractions such as commited resources, which take some of the | pain away. | | Personally I like Metal as an API. It is lower level than OpenGL, | getting rid of most nasty OpenGL things (state machine, lack of | pipeline state objects), yet it is very approachable and easy to | transition to from DX11/OpenGL. I was happy when I saw WebGPU was | based on Metal at first. WebGPU is my go-to 3D API at the moment, | especially with projects like wgpu-native which make it usable on | native platforms too (don't let the Web in WebGPU confuse you). | Teknoman117 wrote: | > Vulkan is nice, but it goes into the other extreme. It's very | low level and designed for advanced users. Even getting | anything on the screen in Vulkan is intimidating because you | have to write everything from scratch. | | I honestly believe that this is the major reason. Developing a | hobby project with OpenGL is little more than using SDL or GLFW | to get a window with a GLContext and then you can just start | calling commands. Vulkan is much more complicated and unless | you're really pushing performance limits, you're not getting | much of a benefit for the extra headache. | gary_0 wrote: | OpenGL is what you use if you just want to render some | triangles on the GPU with a minimum of hassle on the most | platforms (which is quite a few if you include GLES, WebGL, | and ANGLE). Most people aren't writing graphics engines for | AAA games so OpenGL is all they need. | mort96 wrote: | You acknowledge that Vulkan is too low level for people who | aren't investing billions into an AAA graphics engine. And you | surely know that OpenGL and Vulkan are the only two cross- | platform graphics APIs. Are you sure you can't infer why people | like OpenGL from those two points? Especially in Linux-heavy | communities where DX and Metal aren't even options? | | I assure you, none of the "love" for OpenGL comes from the | elegance of its design. | bitwize wrote: | There should be more effort to support Direct3D under Linux. | We have Wine and DXVK, but it should be easier to integrate | the D3D support into Linux applications. | skrrtww wrote: | Despite the progress here, for me it raises a question: Most of | the old games she mentions are x86 32bit games. What's the story | for how these programs are actually going to run in Asahi? Box86 | [1] doesn't sound like it's projected to run on M1. Rosetta 2 on | macOS allows 32-bit code to be run by a 64-bit process, which is | the workaround CrossOver et. al. use (from what I understand), | but that obviously won't be available? | | [1] https://box86.org | TazeTSchnitzel wrote: | QEMU has a "user mode" feature where it can transparently | emulate a Linux process and translates syscalls. You can | probably run at least old 32-bit Linux games that way, assuming | you have appropriate userland libraries available. Windows | content might be trickier. | rowanG077 wrote: | Rosetta 2 runs on Linux. There's also FEX. | amluto wrote: | Does it? Or does Rosetta 2 run on Mac OS with a Linux shim to | ask the host to kindly Rosetta-ify a given binary? | skrrtww wrote: | I guess that's true, I forgot about Apple making Rosetta 2 | installable in Linux VMs. | | Also though, since Rosetta 2 was released, it's had an | incredibly slow implementation of x87 FPU operations, and | anything that relies on x87 floating point math (including | lots of games) is currently running about 100x slower than it | ought to. Apple is aware of it but it's still not fixed in | Ventura. | | I hadn't heard of FEX before, looks interesting. | mort96 wrote: | Huh, I thought everyone used SSE floats these days. I | suppose there may be old games compiled with x87 floats, | but I'd expect those to be made for CPUs so old that even | slow x87 emulation wouldn't be a big issue. | | What software do people have x87-related issues with? | skrrtww wrote: | The software I personally have the most issues with is | Star Wars Episode 1: Racer, a 3d title from 1999 that | from what I understand uses x87 math extensively. In | Parallels (i.e. no Rosetta) it runs at 120fps easily, | while in CrossOver the frame rate barely ekes above 20. | Old titles like Half-Life, all other Source games, | Fallout 3, SWTOR etc. all run vastly worse than they | should, and many cannot run at playable framerates | through Rosetta. Honestly, the problem most likely | extends to more of Rosetta's floating point math than | just x87. | | The author of REAPER has also written about it some: | https://user.cockos.com/~deadbeef/index.php?article=842 | | There's been lots of discussion about the issue in the | Codeweavers forums, and Codeweavers points the blame | squarely at Apple, who have been, predictably, very quiet | about it. | 58028641 wrote: | Does Rosetta on Linux support 32 bit code? I believe FEX | does. | saagarjha wrote: | Rosetta supports emulating 32-bit code. | 58028641 wrote: | On Linux? I know it has been confirmed on macOS. I | haven't heard anyone say they ran 32 bit code on Linux. | mort96 wrote: | Someone would need to make an x86 -> ARM recompiler like | Rosetta 2. That's not an easy task, but also not the task she's | tackling with the GPU driver. | | It's not unprecedented in the open-source space though; the | PCSX2 PlayStation 2 emulator for example contains a MIPS -> x86 | recompiler, and the RPCS3 PlayStation 3 emulator contains a | Cell -> x86 recompiler. | viktorcode wrote: | Can someone explain to me why support OpenGL at all? Vulkan is | easier to implement. Is there a need for OpenGL on Linux? | dagmx wrote: | Because Vulkan, despite the mystical reputation it has in | gaming circles, actually has fairly low adoption vs OpenGL . | | Very few applications in the grand scheme of things use Vulkan, | and a minority of games do. | | Therefore the ROI on supporting OpenGL is very high. | 58028641 wrote: | Doesn't implementing Vulkan give you DirectX with DXVK and | VKD3D and OpenGL with Zink for free? | Cu3PO42 wrote: | Only if you support all of the necessary Vulkan features | and extensions. The article states that getting to that | point would be a multi-year full time effort, whereas | "only" OpenGL seems to be within grasp for this year. And | arguably having a lower OpenGL standard soon is better than | OpenGL 4.6 in a few years. | erichocean wrote: | Yes, with appropriate (and reasonably-available) Vulkan | extensions. | phire wrote: | Keep in mind that Mesa actually implements most of OpenGL for | you. Its not like you are implementing a whole OpenGL driver | from scratch, you are mostly implementing a hardware | abstraction layer. | | My understanding is that this hardware abstraction layer for | mesa is way easier to implement than a full vulkan driver, | especially since the earlier versions of OpenGL only require a | small subset of the features that a vulkan driver requires. | Jasper_ wrote: | Because of how mesa is structured. OpenGL is notoriously | terrible to implement, so there's a whole framework called | Gallium that does the hard work for you, and you slot yourself | into that. Meanwhile, Vulkan is easier to implement from | scratch, so there's a lot less infrastructure for it in mesa, | and you have to implement more of the boring paperwork | correctly. | | It's an accident of history more than anything else. Once the | reverse engineering is further along, I expect a Vulkan driver | to be written for it, and the Gallium one to be phased out in | favor of Zink. | gjsman-1000 wrote: | On a reverse-engineered GPU like this, because of Vulkan's low- | level design, implementing (early) OpenGL might actually be | significantly easier. | | Also, Vulkan isn't popular with game developers because | availability sucks. Vulkan doesn't run on macOS. Or iOS. Or 40% | of Android phones. Or Xbox. Or PlayStation. Or Nintendo | Switch[1]. | | Unless you are targeting Windows (which has DirectX and OpenGL | already), or those 60% of Android phones only, or Linux, why | would you use Vulkan? On Windows, DirectX is a generally- | superior alternative, and you get Xbox support basically free, | and if you also support an older DirectX, much broader PC | compatibility. On Android, just use OpenGL, and don't worry | about separate implementations for the bifurcated Vulkan/OpenGL | support. On Linux, just use Proton with an older DirectX. Whiz | bang, no need for Vulkan whatsoever. Yes, some systems might | perform better if you had a Vulkan over OpenGL, but is the cost | worth it when you don't need it? | | [1] Technically, Vulkan does exist for Nintendo Switch, but it | is so slow almost no production game uses it, and it is widely | considered not an option. Nintendo Switch is slow enough | without Vulkan making it slower. Much easier just to use the | proprietary NVIDIA library. | [deleted] ___________________________________________________________________ (page generated 2022-08-22 23:00 UTC)