[HN Gopher] Libcu++: Nvidia C++ Standard Library
       ___________________________________________________________________
        
       Libcu++: Nvidia C++ Standard Library
        
       Author : andrew3726
       Score  : 190 points
       Date   : 2020-09-19 08:57 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | gj_78 wrote:
       | I really do not understand why a (very good) hardware provider is
       | willing to create/direct/hint custom software for the users.
       | 
       | Isn't this exactly what a GPU firmware is expected to do ? Why do
       | they need to run software in the same memory space as my mail
       | reader ?
        
         | dahart wrote:
         | What do you mean about running in the same memory space? Your
         | operating system doesn't allow that. Is your concern about
         | using host memory? This open source library doesn't
         | automatically use host memory, users of the library can write
         | code that uses host memory, if they choose to.
         | 
         | How would a firmware help me write heterogeneous bits of c++
         | code that can run on either cpu or gpu?
        
           | blelbach wrote:
           | > What do you mean about running in the same memory space?
           | Your operating system doesn't allow that. Is your concern
           | about using host memory?
           | 
           | Actually, the basis of our modern GPU compute platform is a
           | technology called Unified Memory, which allows the host and
           | device processor to share access to memory spaces. We think
           | this is the way going forward.
           | 
           | Of course, there's still the process isolation provided by
           | your operating system.
        
           | gj_78 wrote:
           | IMHO, the question is not that we need code to run on CPUs
           | and GPUs , we do need that, The question is whether the GPU
           | seller has to control both sides. Until I buy a CPU from
           | nvidia I want to keep some kind of independence.
           | 
           | When will we be able to use a future riscv-64 CPU with an
           | nvidia GPU ? we will let the answer to nvidia ?
        
             | blelbach wrote:
             | > IMHO, the question is not that we need code to run on
             | CPUs and GPUs , we do need that, The question is whether
             | the GPU seller has to control both sides.
             | 
             | The question is not about running code on CPUs, or running
             | code on GPUs. It's about running code on both CPUs and GPUs
             | at the same time. It's about enabling the code on the CPU
             | and the code on the GPU to seamlessly interoperate with
             | each other, communicate with each other, move objects and
             | data to and from each other.
             | 
             | Who do you expect to make that happen?
             | 
             | > Until I buy a CPU from nvidia I want to keep some kind of
             | independence
             | 
             | You can buy a CPU from NVIDIA, check out our Tegra systems.
             | We also sell full systems, like DGX platforms, which use a
             | 3rd party CPU.
             | 
             | > When will we be able to use a future riscv-64 CPU with an
             | nvidia GPU ? we will let the answer to nvidia ?
             | 
             | Who else would answer this question?
             | 
             | Okay, you want to use <insert some future CPU> with our
             | GPU.
             | 
             | Who is going to design and build the interconnect between
             | the CPU and the GPU?
             | 
             | Who is going to provide the GPU driver?
             | 
             | The CPU manufacturer? Why would they do that? They don't
             | make any money from selling NVIDIA products. Why should
             | they invest effort in enabling that?
        
             | dahart wrote:
             | You _can_ use this library to write code that runs on both
             | risc-v and a GPU! You seem to be pretty confused about what
             | this library is. It's not exerting any control. It's open
             | source! It's strictly optional, and it only allows
             | developers to do something they actually want, to write
             | code that will compile for any type of processor that a
             | modern c++ compiler can target.
        
               | gj_78 wrote:
               | Again, I see what you mean. I am even against nvidia
               | advising the developers to use such or such C++ library
               | (be it GNU). It is not their role to do that. We need
               | smarter and more shining GPUs from nvidia, not software.
               | 
               | I would say .... The hardware must be sold independently
               | of the software ... but it is a bit too complex, I know.
        
               | blelbach wrote:
               | > It is not their role to do that.
               | 
               | You are incorrect.
               | 
               | NVIDIA employs more software engineers than hardware
               | engineers.
               | 
               | > We need smarter and more shining GPUs from nvidia, not
               | software.
               | 
               | Software is a part of the GPU. You get better GPUs by
               | having hardware and software engineers collaborate
               | together.
               | 
               | It is extremely expensive to put features into hardware.
               | It costs a lot of money and takes a very long time. It
               | takes 2-4 years at a minimum to put features into
               | hardware. And there are physical constraints; we only
               | have so many transistors.
               | 
               | If we make a mistake in hardware, how are we supposed to
               | fix it? At NVIDIA we have a status for hardware bugs
               | called "Fix in Next Chip". The "Next Chip" is 2-4 years
               | away.
               | 
               | So what do we do? We solve problems in software whenever
               | possible. It's cheaper to do so, it has a quicker
               | turnaround time, and most importantly, we can make
               | changes after the product has shipped.
               | 
               | > I would say .... The hardware must be sold
               | independently of the software ... but it is a bit too
               | complex, I know.
               | 
               | We don't sell hardware and you don't want to buy
               | hardware. Trust me, you wouldn't know what to do with it.
               | It's full of bugs and complexity.
               | 
               | We sell a platform that consists of hardware and
               | software. The product doesn't work without software.
               | 
               | If we tried to make the same product purely in hardware,
               | the die would be the size of your laptop and would cost a
               | million dollars.
        
               | dahart wrote:
               | I'm not understanding your point at all. You don't think
               | developers should be able to write C++ code for the GPU?
               | 
               | What do you even mean about 'it is not their role to do
               | that.' and 'hardware must be sold independently of the
               | software'?? Why are you saying this? Software interfaces
               | are critical for all GPUs and all CPUs, just ask AMD &
               | Intel. There is no such thing as CPU or GPU hardware
               | independent of software. Plus, the specific library here
               | _is_ being sold independently of the hardware, it is
               | doing exactly what you say you want, it 's separate and
               | doesn't require having any other nvidia hardware or
               | software. (I can't think of any good reasons to use it
               | without having some nvidia hardware, but it is
               | technically independent, as you wish.)
        
               | gj_78 wrote:
               | > You don't think developers should be able to write C++
               | code for the GPU?
               | 
               | To be clear, I don't think nvidia-paid developers should
               | be able to write C++ Code for a nvidia-sold GPU. The
               | world will be better if any developer (paid by nivida or
               | not) is able to write code for any GPU (sold by nvidia or
               | not). It is not nvidia role to say how or when software
               | will be written. Their hardware is good and that's more
               | than OK.
               | 
               | AI/CUDA code written specifically for nvidia is
               | useless/deprecated in the long term. A lot of brain
               | waste.
        
               | jki275 wrote:
               | That doesn't make any sense.
               | 
               | You're free to write whatever you want. This is Nvidia
               | providing interfaces to their hardware for those of us
               | who don't want to write them for ourselves.
               | 
               | It's a gift. Take it or don't. How in the world you can
               | say Nvidia shouldn't be allowed to write software for
               | their GPUs makes no sense at all. Should the government
               | stop them? Any developer can write anything they want -
               | but Nvidia is obviously going to support their own
               | hardware. How does it make any sense otherwise?
               | 
               | All code is "deprecated in the long term" for a long
               | enough "long term". That doesn't equal useless. Your
               | comment is nonsensical.
        
               | gj_78 wrote:
               | I have nothing against the library itself, rather the
               | fact that it is made by the hardware provider. Good or
               | not, It looks like a marketing goodie given away by the
               | manufacturer.
               | 
               | History shows that hardware providers are not good at
               | maintaining software at a high-quality level. Think of
               | Sun/Sparc/Solaris or IBM/Power/Aix ... both excellent
               | when it comes to hardware and loosers on the software
               | side even after decades of development. It is simply not
               | their "favorite stuff". Note that, linux, developed
               | independently of Sun/IBM, is doing a good job.
               | 
               | There is probably a lot of nice improvements to do at the
               | hardware level for Nvidia GPUs, not to let their
               | engineers work on this kind of disposable software.
               | Please, Nvidia, Let someone else make that software. As a
               | customer, I only need the next release of that Good Old
               | hardware GPU !!!
        
               | blelbach wrote:
               | > To be clear, I don't think nvidia-paid developers
               | should be able to write C++ Code for a nvidia-sold GPU.
               | 
               | I'm not sure what you're saying here? You think another
               | company or organization should write all the software for
               | our hardware?
               | 
               | I don't think you understand the semiconductor industry.
               | 
               | Our business model relies on hardware and software
               | engineers working closely together, as I've described in
               | other replies.
               | 
               | We would not be able to produce a viable product that is
               | solely raw hardware.
               | 
               | Also, what motivation does this other organization or
               | company have to create software for our hardware?
               | 
               | > The world will be better if any developer (paid by
               | nivida or not) is able to write code for any GPU (sold by
               | nvidia or not).
               | 
               | This library is something that is designed to help you
               | write Standard C++ code that runs on our GPU. Standard
               | C++ runs everywhere.
               | 
               | > It is not nvidia role to say how or when software will
               | be written.
               | 
               | Providing the SDKs and toolchains to program our platform
               | is definitely part of our role in the ecosystem.
               | 
               | > Their hardware is good and that's more than OK.
               | 
               | Our hardware is useless without our software.
               | 
               | > AI/CUDA code written specifically for nvidia is
               | useless/deprecated in the long term. A lot of brain
               | waste.
               | 
               | I expect CUDA will be around for a while.
        
               | dahart wrote:
               | With libcu++, Nvidia is not saying how or when software
               | should be written. Because the library is meeting the C++
               | standard, it does exactly what you said you want, it
               | allows any developer to write code for any GPU (or CPU!)
               | The library is doing the thing you're asking for. AMD &
               | Intel can support the same code with only namespace
               | changes, using their own version, because it's open and
               | written to the open standard.
        
         | Const-me wrote:
         | > Isn't this exactly what a GPU firmware is expected to do?
         | 
         | The source data needs to appear on the GPU somehow. Similarly,
         | the results computed on GPU are often needed for CPU-running
         | code.
         | 
         | GPUs don't run an OS and are limited. They can't possibly
         | access file system, and many useful algorithms (like PNG image
         | codec) is a poor fit for them. Technically I think they can
         | access source data directly from system memory, but doing that
         | is inefficient in practice, because GPUs have a special piece
         | of hardware (called copy command queue in d3d12, or transfer
         | queue in Vulcan) to move large blocks of data over PCIe.
         | 
         | That library implements an easier way to integrate CPU and GPU
         | pieces of the program.
        
         | blelbach wrote:
         | NVIDIA employs more software engineers than hardware engineers.
         | 
         | > Why do they need to run software in the same memory space as
         | my mail reader ?
         | 
         | It is a lot more expensive to build functionality and fix bugs
         | in silicon than it is to do those same things in software.
         | 
         | At NVIDIA, we do as much as we possible can in software. If a
         | problem or bug can be solved in software instead of hardware,
         | we prefer the software solution, because it has much lower cost
         | and shorter lead times.
         | 
         | Solving a problem in hardware takes 2-4 years minimum, massive
         | validation efforts, and has huge physical material costs and
         | limitations. After it's shipped, we can't "patch" the hardware.
         | Solving a problem in software can sometimes be done by one
         | engineer in a single day. If we make a mistake in software, we
         | can easy deploy a fix.
         | 
         | At NVIDIA we have a status for hardware bugs called "Won't Fix,
         | Fix in Next Chip". This means "yes, there's a problem, but the
         | earliest we can fix it is 2-4 years from now, regardless of how
         | serious it is".
         | 
         | Can you imagine if we had to solve all problems that way? Wait
         | 2-4 years?
         | 
         | On its own, our hardware is not a complete product. You would
         | be unable to use it. It has too many bugs, it doesn't have all
         | of the features, etc. The hardware is nothing without the
         | software, and vice versa.
         | 
         | We do not make hardware. We make platforms, which are a
         | combination of hardware and software. We have a tighter
         | coupling between hardware and software than many other
         | processor manufacturers, which is beneficial for us, because it
         | means we can solve problems in software that other vendors
         | would have to solve in hardware.
         | 
         | > I really do not understand why a (very good) hardware
         | provider is willing to create/direct/hint custom software for
         | the users.
         | 
         | Because we sell software. Our hardware wouldn't do anything for
         | you without the software. If we tried to put everything we do
         | in software into hardware, the die would be the size of your
         | laptop and cost a million dollars each.
         | 
         | You wouldn't buy our hardware if we didn't give you the
         | software that was necessary to use it.
         | 
         | > Isn't this exactly what a GPU firmware is expected to do ?
         | 
         | Firmware is a component of software, but usually has
         | constraints that are much more similar to hardware, e.g. long
         | lead times. In some cases the firmware is "burned in" and can't
         | be changed after release, and then it's very much like
         | hardware.
        
       | BoppreH wrote:
       | Unfortunate name, "cu" it's the most well known slang for "anus"
       | in Brazil (population: 200+ million). "Libcu++" is sure to cause
       | snickering.
        
         | [deleted]
        
         | amelius wrote:
         | "CU" is also an abbreviation of "see you". I don't think it
         | causes much awkwardness, but I could be wrong.
        
           | ufo wrote:
           | As a Brazilian, I can confirm that we chuckle whenever we see
           | someone use that word :)
        
         | nitrogen wrote:
         | Do chemists have similar problems working with copper, whose
         | chemical symbol is Cu?
        
         | CyberDildonics wrote:
         | Wait until you see the namespace the standard library is under.
         | 
         | Although maybe short words that are slang in languages
         | different from what something was written in aren't a big deal.
        
         | NullPrefix wrote:
         | This only affects developers. Limited scope.
         | 
         | Wasn't there something related about Microsoft Lumia phones?
        
           | nonbirithm wrote:
           | Or how Siri means "buttocks" in Japanese?
        
             | jki275 wrote:
             | It's oshiri, not Siri.
        
           | kitd wrote:
           | cf. the Vauxhall Nova car
           | 
           | "No va" means "doesn't go" in Spanish.
        
             | sterwill wrote:
             | I think it's unlikely that Spanish speakers would have been
             | confused about the word "nova" when used as a car name. In
             | Spanish "nova" describes the same astronomical event we
             | call a "nova" in English: a new light in the sky.
             | Additionally Spanish "nuevo" and English "new" seem to
             | share the same root. My point is these words all mean
             | similar things to English- and Spanish-speaking car buyers.
        
               | retrac wrote:
               | For a non-mythical example, the 2nd-gen Buick LaCrosse
               | was originally named Allure in the Canadian market. Se
               | crosser is Quebec French slang for "to masturbate" and
               | "la crosse" is also a slang term for a swindle or rip-
               | off.
        
             | andrepd wrote:
             | Also Hyundai Kona, "cona" means "cunt" or "pussy" in
             | Portuguese.
        
               | fullstop wrote:
               | I wonder how kona coffee sells over there.
        
               | FridgeSeal wrote:
               | Wow, Kona Bikes [0] must have a fun time in Portugal
               | then..
               | 
               | [0] https://konaworld.com/
        
             | geofft wrote:
             | Customers either didn't make that association or didn't
             | care: https://www.snopes.com/fact-check/chevrolet-nova-
             | name-spanis...
        
           | virgulino wrote:
           | Unix users have "cu". Do "man cu", if you are curious. I
           | haven't played with "cu" since the UUCP email era. Good
           | times.
        
             | moonchild wrote:
             | Doesn't exist on my system, but is at
             | https://linux.die.net/man/1/cu
        
         | blelbach wrote:
         | "cu" is a pretty common prefix for CUDA libraries. cuBLAS,
         | cuTENSOR, CUTLASS, CUB, etc.
         | 
         | It gets worse if you try to spell libcu++ without pluses:
         | 
         | libcuxx libcupp (I didn't hate this one but my team disliked
         | it).
         | 
         | We settled on `libcudacxx` as the alphanumeric-only spelling.
        
         | gswdh wrote:
         | In all honesty, out of the combinations for two and three
         | letter acronyms there's bound to be a language out the there
         | where the meaning is crude. I recall on here recently,
         | something being rude in Finnish or Swedish. We're
         | professionals, it's just a name, who cares.
        
         | unrealhoang wrote:
         | It's penis in Vietnamese (pop. 80M), I guess people don't
         | really care since tech language is usually English
        
         | jcampbell1 wrote:
         | These things never seem to matter even in English. How many
         | times have you heard someone say "I don't like Microsoft",
         | followed by "that's what she said".
        
       | einpoklum wrote:
       | 1. How do we know what parts of the library are usable on CUDA
       | devices, and which are only usable in host-side code?
       | 
       | 2. How compatible is this with libstdc++ and/or libcu++, when
       | used independently?
       | 
       | I'm somewhat suspicious of the presumption of us using NVIDIA's
       | version of the standard library for our host-side work.
       | 
       | Finally, I'm not sure that, for device-side work, libc++ is a
       | better base to start off of than, say, EASTL (which I used for my
       | tuple class: https://github.com/eyalroz/cuda-
       | kat/blob/master/src/kat/tupl... ).
       | 
       | ...
       | 
       | partial self-answer to (1.):
       | https://nvidia.github.io/libcudacxx/api.html apparently only a
       | small bit of the library is actually implemented.
        
         | blelbach wrote:
         | > apparently only a small bit of the library is actually
         | implemented.
         | 
         | Yep. It's an incremental project. But stay tuned.
         | 
         | > I'm somewhat suspicious of the presumption of us using
         | NVIDIA's version of the standard library for our host-side
         | work.
         | 
         | Today, when using libcu++ with NVCC, it's opt-in and doesn't
         | interfere with your host standard library.
         | 
         | I get your concern, but a lot of the restrictions of today's
         | GPU toolchains comes from the desire to continue using your
         | host toolchain of choice.
         | 
         | Our other compiler, NVC++, is a unified stack; there is no host
         | compiler. Yes, that takes away some user control, but it lets
         | us build things we couldn't build otherwise. The same logic
         | applies for the standard library.
         | 
         | https://developer.nvidia.com/blog/accelerating-standard-c-wi...
         | 
         | > Finally, I'm not sure that, for device-side work, libc++ is a
         | better base to start off of than, say, EASTL (which I used for
         | my tuple class: https://github.com/eyalroz/cuda-
         | kat/blob/master/src/kat/tupl... ).
         | 
         | We wanted an implementation that intended to conform to the
         | standard and had deployment experience with a major C++
         | implementation. EASTL doesn't have that, so it never entered
         | our consideration; perhaps we should have looked at it, though.
         | 
         | At the time we started this project, Microsoft's Standard
         | Library wasn't open source. Our choices were libstdc++ or
         | libc++. We immediately ruled libstdc++ out; GPL licensing
         | wouldn't work for us, especially as we knew this project had to
         | exchange code with some of our other existing libraries that
         | are under Apache- or MIT-style licenses (Thrust, CUB, RAPIDS).
         | 
         | So, our options were pretty clear; build it from scratch, or
         | use libc++. I have a strict policy of strategic laziness, so we
         | went with libc++.
        
       | lionkor wrote:
       | > Promising long-term ABI stability would prevent us from fixing
       | mistakes and providing best in class performance. So, we make no
       | such promises.
       | 
       | Wait NVidia actually get it? Neat!
        
         | matheusmoreira wrote:
         | This is an awesome quote... Same argument used by the Linux
         | kernel developers.
        
       | lars wrote:
       | It really is a tiny subset of the C++ standard library, but I'm
       | happy to see they're continuing to expand it:
       | https://nvidia.github.io/libcudacxx/api.html
        
         | shaklee3 wrote:
         | Nvidia has had many members on the c++ standards committee for
         | a while.
        
         | roel_v wrote:
         | Yeah, really tiny... At first I thought 'wow this is a game
         | changer', but then I looked at your link and thought 'what's
         | the point?'. Can someone explain what real problems you can
         | solve with just the headers in the link above?
        
           | jpz wrote:
           | I guess that the point is that when writing CUDA code (which
           | looks like C++), you can use these libraries which are
           | homogenous with CPU code.
           | 
           | Looking at the functions, chrono/barrier etc require CPU
           | level abstractions, so using the STL versions (which are for
           | the CPU) aren't going to work really.
        
           | happyweasel wrote:
           | It runs on the GPU?
        
             | roel_v wrote:
             | What runs on the gpu?
        
               | jcelerier wrote:
               | this library
        
           | blelbach wrote:
           | https://youtu.be/75LcDvlEIYw
           | 
           | https://youtu.be/VogqOscJYvk
        
           | TillE wrote:
           | I would have expected the <algorithm> header, but
           | instead...synchronization primitives? std::chrono? I'm
           | completely baffled about how that would be useful, but that's
           | probably because I know very little about CUDA.
        
             | blelbach wrote:
             | GPUs are parallel processors. So, yes, synchronization
             | primitives are the highest priority.
             | 
             | We focused on things that require /different/
             | implementations in host and device code.
             | 
             | The way you implement std::binary_search is the same in
             | host and device code. Sure, we can stick `__host__
             | __device__` on it for you, but it's not really high value.
             | 
             | Synchronization primitives? Clocks? They are completely
             | different. In fact, the machinery that we use to implement
             | both the synchronization primitives and clocks has not
             | previously been exposed in CUDA C++.
        
         | blelbach wrote:
         | Today, you can use the library with NVCC, and the subset is
         | small. We'll be focusing on expanding that subset over time.
         | 
         | Our end goal is to enable the full C++ Standard Library. The
         | current feature set is just a pit stop on the way there.
        
       | Mr_lavos wrote:
       | Does this mean you can do operations on struct's that live on the
       | GPU hardware?
        
         | shaklee3 wrote:
         | You have been able to do that for a long time with UVA.
        
       | scott31 wrote:
       | A pathetic attempt to lock developers into their hardware.
        
         | jpz wrote:
         | They seem to be pushing the barrier on innovation on GPU
         | compute. It seems a little unfair to call that pathetic,
         | whatever strategic reasons they have to find OpenCL
         | unappetising (which simply enables their sole competitor in
         | truth.)
         | 
         | Their decision making seems rational, of course it's not ideal
         | if you're consumer. We would like the ability to bid off NVidia
         | with AMD Radeon.
         | 
         | Convergence to a standard has to be driven by the market, but
         | it's impossible to drive NVidia there because they are the
         | dominant player and it is 100% not in their interests.
         | 
         | It doesn't mean they're a bad company. They are rational
         | actors.
        
           | [deleted]
        
           | my123 wrote:
           | With nvc++, they are converging towards a standardised source
           | code standard:
           | https://developer.nvidia.com/blog/accelerating-standard-c-
           | wi...
           | 
           | However, this notably doesn't cover binaries, which are GPU
           | vendor specific in that case, so AMD for example would have
           | to provide a C++ compiler implementing stdpar for GPUs
           | targeted to their hardware.
        
         | blelbach wrote:
         | > A pathetic attempt to lock developers into their hardware
         | 
         | Ah-ha, you've caught us! Our plan is to lock you into our
         | hardware by implementing Standard C++.
         | 
         | Once you are all writing code in Standard C++, then you won't
         | be able to run it elsewhere, because Standard C++ only runs on
         | NVIDIA platforms, right?
         | 
         | ... What's that? Standard C++ is supported by essentially every
         | platform?
         | 
         | Darnit! Foiled again.
        
         | gj_78 wrote:
         | Agree++. They are good at hardware and should stay that way.
        
           | my123 wrote:
           | The thing is: that hardware isn't very usable without good
           | software, and an easy to use software stack at that.
           | 
           | That's what NVIDIA understood and made them what they are
           | today.
        
             | gj_78 wrote:
             | A lot of hardware has builtin software, either inside a
             | firmware or as a driver. Keeping the software part in
             | firmware lets customer free to use any kind of OS. Using
             | host cpu and memory is bad design IMHO.
        
               | kortex wrote:
               | That sounds like vendor binary blob sdk libraries, only
               | everything is an rpc and you're not even in the same
               | memory space, aka distributed computing, except you have
               | no control over the device stack. Sounds kinda awful to
               | me.
        
               | blelbach wrote:
               | > A lot of hardware has builtin software, either inside a
               | firmware or as a driver.
               | 
               | Correct.
               | 
               | > Keeping the software part in firmware lets customer
               | free to use any kind of OS.
               | 
               | Do you mean firmware, or firmware and driver?
               | 
               | You can't do everything in firmware.
               | 
               | > Using host cpu and memory is bad design IMHO.
               | 
               | How do you propose that you program the GPU then?
               | 
               | The CPU has to interact with the GPU. Some software has
               | to manage that interaction.
               | 
               | That said, we are not talking about either a driver or
               | firmware. This is a part of our toolchain. It is a
               | library that you use when writing a heterogeneous
               | program.
        
               | dahart wrote:
               | Can you elaborate on what you mean? This is an open
               | source library for developers to write code that can
               | compile without changes on both CPU and GPU. This solves
               | a problem that _can't_ be solved in firmware, and this is
               | not a case of nvidia using cpu and host memory - whether
               | to use cpu and host memory is strictly up to the
               | developer.
        
               | gj_78 wrote:
               | Sorry, related to cpu and host memory , I was wrong. I
               | meant : having the GPU seller control/write code that
               | plays with host cpu and memory is bad. Let people use
               | their own gcc/g++ or whatever compiler and publish the
               | specs. Unless they also start selling CPUs.
        
               | dahart wrote:
               | This _is_ gcc or whatever compiler, it is not nvidia 's
               | compiler. This library does not give nvidia any "control"
               | over host operations, it gives developers another tool.
               | 
               | They _did_ publish the specs, it 's _open source_. BTW,
               | Nvidia 's acquisition of ARM means that it will be
               | selling CPUs.
               | 
               | P.P.S., the driver runs on the host, so your proposed
               | alternative doesn't address the point you think you're
               | making.
        
               | gj_78 wrote:
               | I did not say the library controls anything, Nvidia
               | controls the library : its features, its roadmap, its
               | bugs corrections, development efforts (people) etc. All
               | these choices are made by Nvidia. It is not just another
               | tool , it is the tool that is closest to hardware
               | evolution.
               | 
               | Nvidia buying ARM is not a good news for me. The same way
               | I don't like them making software, I also don't like them
               | selling CPUs or seafood. They are good at GPUs and that's
               | OK.
               | 
               | The drivers are usually running in the kernel space and
               | do not involve much of interaction with users. Firmware,
               | on the other side, is hardware-close software and can be
               | gradually replaced by specific hardware continuous
               | improvements without the user/software or the OS
               | noticing.
        
               | dahart wrote:
               | > It is not just another tool , it is the tool that is
               | closest to hardware evolution.
               | 
               | This is an _open source_ library that meets the C++
               | standard, which is designed and contributed to by many
               | companies, not just nvidia. Like AMD and Intel, Nvidia
               | does release some proprietary things that your complaints
               | might apply to, but this is not one of them.
        
               | my123 wrote:
               | > Keeping the software part in firmware lets customer
               | free to use any kind of OS
               | 
               | Raspberry Pi initially shipped with such a graphics
               | stack, with the Arm side just being a communication
               | driver in the kernel and an RPC stack in user-space.
               | 
               | It isn't a good idea (for numerous reasons, including
               | security) and is even more closed in practice than what
               | ships today.
        
               | gj_78 wrote:
               | Raspberry Pi is not marketed for graphics as nvidia is
               | doing with their GPUs. What I mean is that firmware is
               | running on a usually small cpu and memory that is sold as
               | a part of the GPU. No security issues here as the main
               | security issue is to plug the whole GPU inside your PC.
        
               | my123 wrote:
               | With the complexity of GPU driver stacks, what you are
               | asking for is not firmware, but a multi GHz+ set of CPUs
               | just for that purpose.
               | 
               | + RPC needed all the time... with its latency would tank
               | the performance
               | 
               | It'd also be not tinkerable at all unlike what we have
               | today, it's exactly advocating for the opposite of open.
        
           | blelbach wrote:
           | We employ more software engineers than hardware engineers.
           | Our hardware doesn't really do much in isolation, software is
           | part of the product.
        
             | gj_78 wrote:
             | The question is not about the head count. How many software
             | engineers at nvidia produce software that is expected
             | run/compile on the host CPU of the customer, like this
             | library ? I expect not too much.
        
         | pjmlp wrote:
         | The other vendors are to blame for sticking with outdated C and
         | printf style debugging.
        
           | einpoklum wrote:
           | 1. printf-style debugging is what we use on NVIDIA hardware
           | too.
           | 
           | 2. OpenCL 2.x allows for C++(ish) source code. Not sure how
           | good the AMD support is though.
        
             | pjmlp wrote:
             | 1. Ever heard of Nights and Visual Studio plugins?
             | 
             | 2. OpenCL 2.0 was a failure, so OpenCL 1.2 got renamed as
             | OpenCL 3.0. C++ bindings were dropped and SYSCL is now
             | backend agnostic.
        
               | einpoklum wrote:
               | > 1. Ever heard of Nights and Visual Studio plugins?
               | 
               | Those are apples and oranges... also, you forget cuda-
               | gdb.
               | 
               | > OpenCL 1.2 got renamed as OpenCL 3.0. C++ bindings were
               | dropped
               | 
               | Well, yes, but also no. They were made optional, and
               | transitioned to some other C++-cum-OpenCL initiative:
               | 
               | https://github.com/KhronosGroup/Khronosdotorg/blob/master
               | /ap...
               | 
               | I'm not exactly sure how this differs and what's usable
               | in practice though.
        
         | daniel-thompson wrote:
         | I think CUDA itself is the locking attempt; this is just a tiny
         | cherry on top.
        
       | RcouF1uZ4gsC wrote:
       | For everyone wondering where are all the data structures and
       | algorithms, vector and several algorithms are implemented by
       | Thrust. https://docs.nvidia.com/cuda/thrust/index.html
       | 
       | Seems the big addition of the Libcu++ to Thrust would be
       | synchronization.
        
       | davvid wrote:
       | Here's a somewhat related talk from CppCon '19: "The One-Decade
       | Task: Putting std::atomic in CUDA"
       | 
       | https://www.youtube.com/watch?v=VogqOscJYvk
        
       | fanf2 wrote:
       | " _Whenever a new major CUDA Compute Capability is released, the
       | ABI is broken. A new NVIDIA C++ Standard Library ABI version is
       | introduced and becomes the default and support for all older ABI
       | versions is dropped._ "
       | 
       | https://github.com/NVIDIA/libcudacxx/blob/main/docs/releases...
        
         | MichaelZuo wrote:
         | It's interesting that they use the word to broken to describe
         | incompatible machine code. Well if the code is recompiled for
         | each new version then it's different from the old machine code,
         | that's by definition. Does any major software vendor support
         | older versions of the ABI or machine code?
        
           | my123 wrote:
           | Note here that your binaries will continue to run even on
           | future driver versions - and future hardware - that's what
           | PTX is for, as the standard libraries are statically linked
           | in.
           | 
           | It's just your object files that aren't compatible, so that
           | you can't mix and match libraries built with different CUDA
           | versions into the same binary.
        
             | blelbach wrote:
             | Yep, this is a good summary (good enough that perhaps I
             | should put something similar in the docs).
        
           | geofft wrote:
           | > _Does any major software vendor support older versions of
           | the ABI or machine code?_
           | 
           | Yes, this is extraordinarily common. The ABI is an
           | _interface_ , a promise that new versions of the machine code
           | for a library can both be used by binaries compiled against
           | the old one. There's new machine code, but there's no "by
           | definition" of whether they make this promise or not.
           | 
           | glibc (and the other common libraries) on basically all the
           | GNU/Linux distros does this: that's why it's called
           | "libc.so.6" after all these years. New functions can be
           | introduced (and possibly new versions of functions, using
           | symbol versioning), but old binaries compiled against a
           | "libc.so.6" from 10 years ago will still run today. (This is
           | how it's possible to distribute precompiled code for
           | GNU/Linux, whether NumPy or Firefox or Steam, and have it run
           | on more than a single version of a single distro.)
           | 
           | Apple does the same thing; code linked against an old
           | libSystem will still run today. Android does the same thing;
           | code written to an older SDK version will still run today,
           | even though the runtime environment is different.
           | 
           | Oracle Java does the same thing: JARs built with an older
           | version of the JDK can load in newer versions.
           | 
           | Microsoft does this at the OS level, but - notably - the
           | Visual C++ runtime does _not_ make this promise, and they
           | follow a similar pattern to what Nvidia is suggesting. You
           | need to include a copy of the  "redistributable" runtime of
           | whatever version (e.g. MSVCR71.DLL) along with your program;
           | you can't necessarily use a newer version. However, old DLLs
           | continue to work on new OSes, and they take great pains to
           | ensure compatibility.
        
             | aronpye wrote:
             | Excellent comment, I was wondering how glibc handled
             | backwards compatibility.
             | 
             | Is symbol versioning an ELF object file thing, or is it
             | more universal than that?
        
               | geofft wrote:
               | Almost all of the time, they do it via just adding new
               | features and not breaking old ones.
               | 
               | But yeah, GNU/Linux and Solaris both have symbol
               | versioning as part of ELF (I'm not sure if other
               | executable formats have it; it doesn't actually require
               | very much out of the format). The approach, roughly, is
               | that each symbol in the file is named something like
               | "memcpy@GLIBC_2.2.5", and if you see symbol versions in
               | the library you're linking against, you include those
               | references. The dynamic linker is also smart enough to
               | resolve unqualified symbols against some default version
               | the library specifies. This is important for backwards-
               | compatibility, for the ability for distros to add symbol
               | versions when upstream doesn't have them yet, and for
               | things like dlsym("memcpy") keeping working. When they
               | make a backwards-incompatible change (e.g., old memcpy
               | supports overlapping ranges, new memcpy does not promise
               | to do the right thing and you need to use memmove
               | instead), they add a new version (e.g.,
               | "memcpy@GLIBC_2.14"). Anything compiled against the newer
               | library will reference the new version, but an
               | implementation of the old version still sticks around for
               | older functions.
               | 
               | And yes, there were older versions before libc.so.6 -
               | libc.so.5 was used, I think, in the early 2000s, but
               | they've avoided changes since then. (The approach used
               | there is that you can install both of them on a single
               | system, but "libc.so" symlinks to one of them, and that
               | name is used when you compile code. When you run gcc
               | -lfoo, it looks libfoo.so, but if the library has a
               | header saying its "real" name, called its "SONAME", is
               | libfoo.so.1, the compiled program looks for libfoo.so.1
               | and not libfoo.so.) Now you only have to have a single
               | glibc version and it works with many years of updates.
        
               | jcelerier wrote:
               | ELF: https://lists.debian.org/lsb-
               | spec/1999/12/msg00017.html
        
           | londons_explore wrote:
           | Famously Microsoft does with Windows. That's how an exe file
           | from 25 years ago can still run today.
        
             | moonchild wrote:
             | Yes, but GPU architecture changes very frequently.
             | 
             | Shaders from 15 years ago still work, but they're compiled
             | on-the-fly to a GPU-dependent format. I expect you don't
             | want to have to recompile an entire c++ stdlib every time
             | you recompile your own code.
        
             | formerly_proven wrote:
             | Running 32 bit x86 code on a AMD64 machine is possible on
             | most operating systems which supported both of these, and
             | has probably more to do with AMD64 supporting that
             | execution model.
        
               | londons_explore wrote:
               | Try that on Linux and you'll find most libraries no
               | longer have the same entry points and that various data
               | structures have changed leading to fun fun crashes...
               | 
               | The kernel itself has maintained (mostly) ABI
               | compatibility though.
        
               | formerly_proven wrote:
               | That's a "you're holding it wrong" problem, though.
               | Projects like GTK or Qt never claimed they'd be
               | backwards-compatible 26 years (Qt has specific backwards-
               | compatibility API and ABI guarantees and are in my
               | experience pretty diligent about it), so if you want a
               | binary to work for a long time, you have to ship your own
               | versions of these. Libraries like Xlib on the other hand
               | are very stable and much more similar to the Win32 API in
               | that respect. In theory Linux has versioning for
               | libraries, in practice it is never used correctly and
               | useless anyway, since distros generally only keep around
               | one version of everything, so even if you'd link against
               | a specific version (e.g. libfoobar.so.2.21 instead of
               | libfoobar.so.2, which will break if you don't recompile
               | and/or patch the source), it wouldn't exist _anyway_
               | after a few updates. And that's mostly because distros
               | never promised you'd be able to run binaries built
               | outside their packaging infrastructure anyway; it being
               | common practice and sometimes working doesn't imply it's
               | guaranteed to work.
               | 
               | Hence why C applications only linking these "basic"
               | libraries (libc, Xlib, zlib, ...) are regarded as so
               | stable and portable, because they're built and linked
               | against system components which rarely change. (Keep in
               | mind to build this kind of binary on ancient systems,
               | otherwise glibc will make sure it won't work everywhere).
        
               | XorNot wrote:
               | This is one of those things it feels like all the content
               | addressable initiatives should be able to solve somehow.
               | With near ubiquitous internet access, why can't a program
               | ship with a list of standard library hashes it'll link
               | against and my distro go fetch them from IPFS or whatever
               | if they're not local.
        
               | mlvljr wrote:
               | Needs 15 more years of hype cycles, that's why only :)
        
               | pantalaimon wrote:
               | The solution for this is now Docker, Flatpack, Snap, ...
               | 
               | Just ship the whole environment and only rely on the
               | stable kernel API.
        
               | surajrmal wrote:
               | This is very wasteful. On servers in the cloud that may
               | be a reasonable approach, but there are still devices
               | that are memory, storage, and/or network constrained
               | enough where it's not. It's still necessary to have
               | relatively stable interfaces such that most things can
               | share the same version of a dependency and their still
               | exists the ability to deduplicate the dependencies
               | between different programs. I do agree that the current
               | OS approaches to handle this are not great and there is
               | room for new models, but docker containers are not a
               | holistic solution.
        
               | formerly_proven wrote:
               | The Windows Component Store (aka Side-by-Side aka WinSxS)
               | is sort of a content-addressed store for DLLs and the
               | like, except the content-addressing isn't facilitated by
               | a literal cryptographic hash over the contents, but
               | instead by the logical identity of the component
               | (name+version but more). And, it doesn't fetch anything
               | automatically. Writes to certain paths are just
               | intercepted and redirected into it, while storing an
               | association somewhere else that the app that did that (or
               | was installed doing that) wants that particular component
               | (or at least, that's how I think that works).
        
               | rkeene2 wrote:
               | This is what AppFS does, as well as CernVM-FS, though
               | AppFS has more features
        
               | patrec wrote:
               | Nix basically already does this, apart from the
               | decentralised distributed cache (there is a centralised
               | one and you can easily set up your own, too). All
               | references, including to dynamically linked libraries are
               | via unique, content addressable hash -- where "content"
               | currently still happens to be content of the build recipe
               | and all dependencies and sources, recursively, not the
               | built artefact. There is work on referencing artefacts by
               | the binary output hash though, because that obviously has
               | better security properties when you want to have a non-
               | centralized cache; the main problem is that a lot of
               | software still has no reproducible build.
        
           | haberman wrote:
           | > Does any major software vendor support older versions of
           | the ABI or machine code?
           | 
           | The C++ Standards Committee has been prioritizing ABI
           | compatibility at the cost of performance for the last decade
           | or so (mostly in the standard library, as opposed the
           | language itself, as I understand it). Some people (especially
           | people from Google) have been arguing that this is the wrong
           | priority, and that C++ should be more willing to break ABI.
           | See:
           | 
           | https://cppcast.com/titus-winters-abi/
           | 
           | http://www.open-
           | std.org/jtc1/sc22/wg21/docs/papers/2019/p186...
           | 
           | http://www.open-
           | std.org/jtc1/sc22/wg21/docs/papers/2020/p213...
           | 
           | Disclosure: I work at Google with several of the people
           | advocating for ABI breaking changes.
        
         | quotemstr wrote:
         | There should be no expectation of C++ ABI compatibility. Do you
         | want your system to be ABI compatible or do you want it to
         | evolve? You can't have both. You have to pick one. I favor
         | evolution.
        
           | retrac wrote:
           | A properly designed ABI is capable of expansion. The design
           | risk is not so much being backed into a corner, as just
           | accumulating a great deal of obsolete cruft over the
           | years/decades.
           | 
           | Win32 is a great example of this. It has been extensively
           | overhauled, and best practice for writing a new application
           | today is quite different from 25 years ago, but unmodified
           | Windows 95 applications still usually run correctly.
        
       | jlebar wrote:
       | This is super-cool.
       | 
       | For those of us who can't adopt it right away, note that you can
       | compile your cuda code with `--expt-relaxed-constexpr` and call
       | any constexpr function from device code. That includes all the
       | constexpr functions in the standard library!
       | 
       | This gets you quite a bit, but not e.g. std::atomic, which is one
       | of the big things in here.
        
       ___________________________________________________________________
       (page generated 2020-09-19 23:00 UTC)