hngopher.com

       [HN Gopher] CUDA 11.0
       ___________________________________________________________________
        
       CUDA 11.0
        
       Author : ksec
       Score  : 113 points
       Date   : 2020-07-08 17:42 UTC (5 hours ago)
        
 (HTM) web link (docs.nvidia.com)
 (TXT) w3m dump (docs.nvidia.com)
        
       | Yajirobe wrote:
       | > Added support for Ubuntu 20.04 LTS on x86_64 platforms.
       | 
       | Huh? I've been using CUDA for a while now on my Ubuntu 20.04
       | machine
        
         | HuShifang wrote:
         | Coincidentally I was just looking at this after upgrading from
         | 19.10 to 20.04 today -- basically, the version installed from
         | Nvidia's old 18.04 repos work fine on 20.04.
        
         | hobofan wrote:
         | That probably just means that they are testing that they are
         | testing that platform in CI now, and are officially supporting
         | it (taking bug reports into account, etc.).
        
         | init wrote:
         | This has been problematic for me after upgrading from 18.04 to
         | 20.04. Every time I apt update cuda or some nvidia package my x
         | server fails to start for several different reasons.
         | 
         | Just today I've already spent 30 minutes trying to start x with
         | this latest cuda update. Too bad I can't switch back to the
         | open source nouveau driver.
        
       | usmannk wrote:
       | I noticed CUDA 11.0 was almost ready for release last week when I
       | went to install CUDA and the default download page linked to the
       | 11.0 Release Candidate. The 10.1 and 10.2 links were buried
       | behind a link off to the side labeled "legacy". The thing is, no
       | library you use is going to be supporting the CUDA 11.0 RC,
       | that's ridiculous. For example, Pytorch stable is on 10.2 and
       | Tensorflow only goes up to 10.1.
       | 
       | This is generally indicative of how poorly organized the CUDA
       | documentation and installation instructions are. The Conda
       | dependency manager has made this a lot easier recently.
       | Especially by, e.g., providing pytorch binaries. Though if you
       | want to use packages like NVIDIA Apex for mixed precision DL[0]
       | you're going to be in for a huge headache trying to compile torch
       | from source while also managing your cuda and nvcc version, which
       | sometimes must be the same but sometimes can not be![1]
       | 
       | [0] Yes, I'm aware that Apex was very recently brought into torch
       | but it seems that the performance issues haven't been ironed out
       | yet.
       | 
       | [1] https://stackoverflow.com/questions/53422407/different-
       | cuda-...
        
         | bbatsell wrote:
         | I have to use containers with nvidia-docker because NVIDIA so
         | consistently and relentlessly breaks things without so much as
         | a glance at backward compatibility.
        
           | etaioinshrdlu wrote:
           | The annoying thing is that nvidia-docker is still not great.
           | You still have to deal with the driver installed outside the
           | container, and it makes a big difference.
           | 
           | Furthermore it seems like even the CUDA runtime is typically
           | not installed in the container, but rather injected in by the
           | nvidia-docker container runtime.
           | 
           | It is not fun to deal with.
        
           | dijksterhuis wrote:
           | I moved our Deep Learning servers over to Docker images +
           | JupyterHub DockerSpawners recently because maintaining all
           | the various version dependencies between frameworks was an
           | absolute PITA.
           | 
           | Images are publicly available here in case anyone else needs
           | something similar: https://hub.docker.com/u/uodcvip
        
           | TheGuyWhoCodes wrote:
           | I'm never sure of the relation between the driver, nvidia-
           | docker and the container with a specific cuda version.
           | 
           | Last time I tried it the cuda inside the container tough it
           | was using some old driver version while a much newer version
           | was installed on the host. So I had to manual install the
           | older version, not sure where the issue was but maybe it was
           | because I was using the deprecated nvidia-docker version 2
           | which is still needed to pass gpu resources to containers run
           | inside kubernetes.
        
         | liuliu wrote:
         | Yeah. To make the matter worse, they have updated libnccl-dev
         | from their apt repo to be CUDA-11 based a few weeks ago. That
         | breaks my CUDA app (because it is still on 10.2 and not
         | compatible) in interesting ways. apt-hold libnccl-dev for a
         | while and waiting for this release.
        
         | jjoonathan wrote:
         | Yeah, and the CUDA 10.0 official Visual Studio demo project
         | build was broken for... looks like a year, at least, because
         | they didn't want to populate the toolkit path. NVidia, you're
         | better than this.
         | 
         | https://forums.developer.nvidia.com/t/the-cuda-toolkit-v10-0...
         | 
         | > The Conda dependency manager has made this a lot easier
         | 
         | Yeah but conda is "Let's do dependency management with a SAT
         | solver, it'll be great!" On a good day, it's just slow. On a
         | bad day, the SAT solver spins for hours before failing to
         | converge. On a really bad day, the SAT solver does something
         | "clever."
         | 
         | I've had a couple of really bad days this year. I'm really
         | starting to not like conda very much.
        
           | ajtulloch wrote:
           | You might find https://github.com/TheSnakePit/mamba useful,
           | especially if you are slowed down by package resolution.
        
             | jjoonathan wrote:
             | That looks worth a look for sure!
        
           | techwizrd wrote:
           | Conda's SAT solver for dependency management is the bane of
           | my existence. For pip-installable packages, I'll almost
           | always turn to pip rather than conda even when in a conda
           | environment.
        
           | jabl wrote:
           | My biggest gripe with the conda depencency manager is that it
           | doesn't keep track of which packages own which files, and if
           | multiple packages own the same file the last one to be
           | installed will happily scribble over whatever was there
           | before. With hilarious results, of course.
           | 
           | This means that keeping a conda installation up to date is
           | often very tricky, when upgrading you frequently have to
           | uninstall and reinstall some packages.
           | 
           | It works better if you start from scratch with a
           | requirements.yml file.
        
       | ykl wrote:
       | Interesting that Fedora support seems to have been dropped.
       | Anyone know why that might be?
       | 
       | Edit: oh wait I think I see. Latest supported gcc for CUDA 11 is
       | gcc 9.x, but I think latest Fedora is on gcc 10.
        
         | tw04 wrote:
         | I'm guessing it was an oversight in the table given they still
         | have all the fedora installation instructions on the install
         | page:
         | 
         | https://docs.nvidia.com/cuda/cuda-installation-guide-linux/i...
        
         | core-questions wrote:
         | They've come to the conclusion that you should also come to,
         | that Fedora is basically a waste of time to support because
         | whatever you've gotten working will be terribly broken in the
         | next release for no good reason anyone can point to?
        
       | infairverona wrote:
       | Everytime I have to deal with multiple versions of CUDA on Linux
       | I feel like poking my eyes out. I get that supporting developer
       | libraries that have to interact with hardware is hard but come
       | on...
        
       | ziddoap wrote:
       | >cuFFT now accepts __nv_bfloat16 input and output data type for
       | power-of-two sizes with single precision computations within the
       | kernels.
       | 
       | This exact sentence is listed both under "New Feature" and "Known
       | Issues". I'm not super familiar with CUDA stuff, but, it can't be
       | both right?
        
         | usmannk wrote:
         | Looks like a mistake, should only be in New Features.
        
           | bjornsing wrote:
           | So a known issue in the known issues?
        
             | willwill100 wrote:
             | A known known
        
         | flx42_ wrote:
         | Thanks, I have reported it internally and it is now fixed.
        
       | cjhanks wrote:
       | Does anyone understand why such minor upgrades resulted in a
       | major version bump? Is this some sort of stability check point?
       | Or some other versioning convention?
        
         | ibn-python wrote:
         | Usually for an API it indicates a breaking change. In this case
         | the removal of some functions which might require refactoring
         | on the consumers end.
        
           | gowld wrote:
           | And to keep users on a hardware upgrade treadmill.
        
         | bigdict wrote:
         | I think the page lists the changes since 11.0 RC, not the
         | previous major version.
        
       | darksoulzz wrote:
       | Exciting
        
       | chewxy wrote:
       | oh great. More chasing to do. Anyone interested in working on the
       | CUDA integration for Go (https://gorgonia.org/cu)? PRs welcome,
       | as I am quite short on time.
        
       ___________________________________________________________________
       (page generated 2020-07-08 23:00 UTC)