hngopher.com

       [HN Gopher] Nvidia AGX Xavier Developer Kit: Refreshed with 32GB...
       ___________________________________________________________________
        
       Nvidia AGX Xavier Developer Kit: Refreshed with 32GB of RAM, for
       $699
        
       Author : my123
       Score  : 88 points
       Date   : 2020-05-03 18:48 UTC (4 hours ago)
        
 (HTM) web link (www.nvidia.com)
 (TXT) w3m dump (www.nvidia.com)
        
       | king_magic wrote:
       | My biggest complaint with the Jetson line is it's all ARM. Look,
       | I get it. But the developer experience is horrible. Building
       | Docker containers for ARM devices is a pain. Hell, building
       | anything for a Jetson can be a pain unless it's a pre-packaged
       | NVIDIA thing - really not a fan of building things from source.
       | Add on top of that NVIDIA's very low level documentation for
       | pretty much any tooling they ship, coupled with the difficulty in
       | getting near-time engineering support (unless you want to post to
       | one of their message boards, and hope you get an answer back in
       | less than a week)... basically, it's really rough to do anything
       | seriously useful with Jetson hardware.
       | 
       | Second biggest complaint is deploying Jetsons in production
       | environments. Dev kits aren't production stable, so you either
       | need to build your own carrier board or find one pre-built, and
       | frankly that's just a giant pain to do.
       | 
       | Third biggest complaint is having to flash Jetsons manually.
       | Misery.
       | 
       | A production-ready x64 Jetson that you could order directly from
       | NVIDIA would be my dream. Add up all of the shortcomings and
       | overhead of ARM Jetsons and IMO you do not have a viable device
       | for shipping AI solutions at scale.
        
         | choppaface wrote:
         | The dev support is also bottom-of-the-barrel even if you're a
         | high-margin cloud customer. For a generous upper bound of what
         | Nvidia considers "software support," look at TensorRT, where a
         | majority of the useful stuff has either been written by third
         | parties or scoped out the hard way by people trying to use it.
         | Nvidia isn't really a software company, and their core product
         | has a very narrow user interface. These factors hamper whatever
         | you can get out of a Jepsen.
        
         | 0x8BADF00D wrote:
         | > Building Docker containers for ARM devices is a pain in the
         | ass.
         | 
         | It's not so bad, you just need a beefy ARM machine to build the
         | containers in CI. It would be silly to build a Docker container
         | on the Jetson itself. You would never use an embedded device
         | for compiles and builds, why would you build Docker containers
         | on one?
        
           | linarism wrote:
           | Genuinely curious, what is an example of a beefy ARM machine?
        
             | pram wrote:
             | Anything with a ThunderX processor is maximum beefy. You
             | can get on-demand servers like that from places like
             | Packet. AWS also has their own A1 instances with lower core
             | counts. These would all be good for cross compiling/builds.
             | 
             | Comedy answer: iPad Pro
        
               | p1necone wrote:
               | > Comedy answer: iPad Pro
               | 
               | I mean, the iPad Pro _does_ have a relatively beefy
               | processor. If only you could run arbitrary code on it.
        
               | BillinghamJ wrote:
               | A lot of instruction sets aren't there yet, but
               | https://ish.app is doing a truly incredible job in this
               | regard.
               | 
               | Gives you a working Alpine Linux installation which you
               | can download and install packages for normally, all
               | within the bounds of the normal Apple sandbox, with
               | decent enough performance.
               | 
               | It doesn't have SSE or MMX yet, so eg Go and Node aren't
               | usable at this point. But a shocking amount actually does
               | work perfectly, so it's only a matter of time as more
               | instruction sets are implemented.
        
           | king_magic wrote:
           | Spinning up an ARM machine to build a container that needs to
           | be shipped over the network to a Jetson is a pain when
           | developing.
        
             | threeseed wrote:
             | You could spin up an AWS ARM instance in about 30s.
             | 
             | Or just have Jenkins etc trigger the launch of one when
             | your build job needs it.
        
             | 0x8BADF00D wrote:
             | Welcome to embedded computing :-)
        
               | king_magic wrote:
               | I get it, but I also simply don't want to deal with it at
               | an embedded level. Which is why I vote with my wallet and
               | choose x64 hardware with Quadro GPUs to ship for
               | production.
               | 
               | More expensive, more power consuming? Sure. More sanity?
               | Massively better dev experience? Massively better
               | production/ops experience? Absolutely.
        
         | arthurcolle wrote:
         | Why is it a pain to build Docker container for the ARM
         | architecture? Just curious, haven't had to do it myself.
        
           | king_magic wrote:
           | If your dependencies don't have ARM builds, you need to build
           | them from source. Which is fine, unless you get unlucky and
           | they don't build. Which happens way more often than I'd like.
        
       | my123 wrote:
       | For a moddable Arm platform with all batteries included, there
       | isn't really an alternative to this, especially at $699.
       | 
       | It's much stronger than an RPi and could fill the gap between RPi
       | and Arm-based server platforms.
        
         | mappu wrote:
         | In terms of using it as an ARMv8 desktop workstation (with
         | decent CPU performance, real SATA / Ethernet / PCI-e
         | connectors) - some other contenders include the MACCHIATObin
         | (quad A72) and Honeycomb LX2K (16-core A72, 750USD) from Solid-
         | Run.
        
       | farseer wrote:
       | Honest question, can something like this kill the market for
       | embedded DSP processors made by Texas Instruments or Analog
       | Devices?
        
         | qppo wrote:
         | No, those face more threat from ARM M4/7 cores or RISC-V units
         | with vector/SIMD coprocessors. DSP cores are being lifted out
         | of discrete chips and placed as IP blocks into more integrated
         | solutions, with the really complex algo stuff placed in general
         | purpose hardware like FPGAs.
         | 
         | The economics don't really make sense for TI/ADI DSPs imo. If
         | you had an application where you needed a chip just to do DSP
         | you'd probably use an ARM core instead - but the applications
         | engineers at TI/ADI will gladly help you find a product in
         | their catalog that has more features integrated into it (like
         | ADC/DAC, even analog front ends for audio/RF, USB/Bluetooth
         | stacks) for your product.
         | 
         | Basically there's no market to kill, from what I've seen.
        
         | Cerium wrote:
         | I'm seeing lots of Analog Devices DSPs replaced by Tegras and
         | Zync MPSoCs.
        
         | augustt wrote:
         | I mean, this is absurdly more powerful than those dedicated
         | DSPs.
        
           | WJW wrote:
           | It's also easily 10x the price. It really matters what the
           | application is and how much processing power you need.
        
       | smg wrote:
       | The cheapest Volta GPUs I have seen so far cost over 2K for 12GB.
       | Can the GPU provided in this kit be used for training?
        
         | fizixer wrote:
         | Nope, the use of the words 'edge' and 'inference' in the tag-
         | line pretty much mean there is no learning, no training.
        
           | wmf wrote:
           | Does it have "fake" tensor cores? Aren't those for training?
        
             | fizixer wrote:
             | You still need tensor cores for inference. But they don't
             | do weight updates. Learning/training is all about updating
             | the weights (through backpropagation or whatever).
             | 
             | So another way to put it: its tensor cores do feed-forward
             | calculations, but no backpropagation, and no weight
             | updates.
        
               | rrss wrote:
               | The hardware and platform is capable of training just
               | fine. It's just rarely done because it is slower than
               | training on pretty much any discrete GPU.
        
         | dchichkov wrote:
         | Yes, if your model is small enough or, if you are fine-tuning
         | small number of layers. TendorFlow 1.15 and 2.0 are available
         | on Xavier. I understand that PyTorch could be built as well.
         | 
         | Nite that the number of CUDA kernels and amount of memory
         | available is smaller, if compared to descrete Volta GPUs.
        
           | fizixer wrote:
           | You say it can do training for small models because of the
           | presence of the small (512-core) GPU? (plus maybe some left-
           | over, control calculations by the CPU)
        
             | corysama wrote:
             | It's a low-wattage device. It's performance can't hold a
             | candle to a last-gen card that uses 10X the power.
        
       | fizixer wrote:
       | Inference only. (So this is competing with Google TPUv1; a few
       | years late and way more expensive, but with more memory)
        
         | my123 wrote:
         | It can do training with its GPU, not the fastest thing in the
         | world though.
        
         | wmf wrote:
         | You can't put a TPUv1 in a car because Google doesn't sell
         | them.
        
         | rrss wrote:
         | 1. This isn't inference only, it has the full capabilities of a
         | normal GPU, just small and low power (and therefore much slower
         | than normal GPUs).
         | 
         | 2. TPUv1 is a matrix multiply ASIC that requires a host CPU to
         | do anything. This thing is a SoC that includes both a CPU and a
         | GPU. The CPU is pretty fast for what it is - much faster than
         | e.g. raspberry pi, see https://www.phoronix.com/scan.php?page=a
         | rticle&item=nvidia-j....
         | 
         | 3. not sure how you know whether this is more expensive than a
         | TPUv1, since the TPUv1 was never sold or available outside of
         | google.
         | 
         | A much better comparison would be between this and the Edge TPU
         | development board.
        
       | snek wrote:
       | The lineup continues to balloon in price. A lot of students would
       | the TK and TX models for robotics and whatnot, since they were
       | only 200-300 bucks.
        
         | shmolyneaux wrote:
         | The Jetson Nano[0] seems like it would be better for students,
         | it's only $99.
         | 
         | [0]: https://developer.nvidia.com/embedded/jetson-nano-
         | developer-...
        
       | TaylorAlexander wrote:
       | I have one of these powering my open source four wheel drive
       | robot. [1]
       | 
       | I've started doing machine learning experiments with it finally.
       | (See [1] for details)
       | 
       | There's a few tricks to getting the best performance. You want to
       | convert your neural network to run with NVIDIA's TensorRT library
       | instead of just tensorflow or torch. TensorRT does all the
       | optimized goodness that gets you the most out of the hardware.
       | Not all possible network operations can run in TensorRT (though
       | nvidia updates the framework regularly). This means some networks
       | can't be easily converted to something fully optimized for this
       | platform. Facebook's detectron2 for example uses some operations
       | that don't readily convert. [2]
       | 
       | But then if you're new like me you've got to both find some code
       | that will ultimately produce something you can convert to
       | TensorRT, and you also need something that you can easily train.
       | I've learned that training using your own dataset is often non-
       | obvious. A lot of example code shows how to use an existing
       | dataset but they totally gloss over the specific label format
       | those datasets use. That means you've got to do some digging to
       | figure out how to make your own dataset load properly in to the
       | training code.
       | 
       | After trying a few different things, I've gotten some good
       | results training using Bonnet(al) [3]. I was able to make enough
       | sense of its training code to use my own dataset, and it looks
       | like it will readily convert to TensorRT. Then you load the
       | converted network using NVIDIA's Deepstream library for maximum
       | pipeline efficiency [4].
       | 
       | The performance numbers for the AGX Xavier are very good, and I
       | am hopeful I will get my application fully operation soon enough.
       | 
       | [1] https://reboot.love/t/new-cameras-on-rover/
       | 
       | [2] https://github.com/facebookresearch/detectron2/issues/192
       | 
       | [3] https://github.com/PRBonn/bonnetal
       | 
       | [4] https://developer.nvidia.com/deepstream-sdk
        
       | weinzierl wrote:
       | They seem to offer a cheaper 8 GB model too but unfortunately I
       | see no price for it. I'm curious how much it'll be because, as
       | much as I'd like to toy around with this, the $699 is a little to
       | much for just experimentation.
       | 
       | EDIT: The 8GB _Module_ seems to be $679 here[1]. This makes the
       | $699 or the 32 GB _Developer Kit_ seem like a steal. Still, too
       | expensive for play, I guess I 'll stick with my Jetson Nanos for
       | a while...
       | 
       | [1] https://www.arrow.com/en/products/900-82888-0060-000/nvidia
        
         | my123 wrote:
         | I see that AGX Xavier devkit as much more of an Arm desktop
         | platform, which it is very suitable for too.
        
         | fluffything wrote:
         | There is also the Jetson Nanokit which costs ~120 EUR.
        
       | fvv wrote:
       | I'm definitely not expert and probably this is a dumb question ,
       | but why smart edge things like smart robot and not dumb edge with
       | smart central brain ? Anyway data are useful aggregated
       | central;ly why not incorporate the brain centrally too?
        
         | dejv wrote:
         | For things like industrial robots or UAVs latency is biggest
         | problem.
         | 
         | I've worked on fruit sorting machine and there was about 20ms
         | to make decision if the object passed or not + there was
         | continuous streams of 10000s of objects per second to classify.
         | The computer vision/classifier had to be both fast and reliable
         | about spitting the answers, which was actually more important
         | than precission of classifier itself.
        
         | TaylorAlexander wrote:
         | Well, first some clarification - "edge" means "on robot" versus
         | something in the cloud. And the reason you do this is latency
         | and connectivity.
         | 
         | I am designing a four wheel drive robot using the NVIDIA AGX
         | Xavier [1] that will follow trails on its own or follow the
         | operator on trails. You don't want your robot to lose cellular
         | coverage and become useless. Even if you had coverage, there
         | would be significant data usage as Rover uses four 4k cameras,
         | which is about 30 megapixels (actually they max out at 13mp
         | each or 52mp total). Constantly streaming that to the cloud
         | would be very expensive on a metered internet connection. Even
         | on a direct line the machine would saturate many broadband
         | connections. Of course you can selectively stream but this
         | makes things more complicated.
         | 
         | Latency is an issue. Imagine a self driving car that required a
         | cloud connection. It's approaching an intersection and someone
         | on a bicycle falls over near its path. Better send that sensor
         | data to the cloud fast to determine how to act!
         | 
         | On my Rover robot it streams the cameras directly in to the GPU
         | memory where it can be processed using ML without ever being
         | copied through the CPU. It's super low latency and allows for
         | robots that respond rapidly to their environment. Imagine
         | trying to make a ping-pong playing robot with a cloud
         | connection.
         | 
         | I am also designing a farming robot. [2] We don't expect any
         | internet connection on farms!
         | 
         | [1] https://reboot.love/t/new-cameras-on-rover/ [2]
         | https://www.twistedfields.com/technology
         | 
         | Edit: Don't forget security! Streaming high resolution sensors
         | over the cloud is a security nightmare.
        
           | fizixer wrote:
           | Edge means on-premise (on robot) as you said.
           | 
           | But 'edge,' as used in context of AI, is also a wink-and-a-
           | nod that the device is inference-only (no learning, no
           | training). The term "inference only" doesn't sound very
           | marketing-friendly.
        
             | my123 wrote:
             | AGX Xavier can do training on device just fine - and run
             | every CUDA workload. It's just not the fastest device at
             | that, you'd prefer a desktop GPU if you can for such a
             | purpose.
        
               | michaelt wrote:
               | I assume what fizixer means is, if you're making an
               | Amazon-Alexa-type-thing, training 1 model on 1 million
               | user's data will work better than 1 million models
               | trained on 1 user's data each.
               | 
               | AFAIK the "Roomba learns the layout of your house" type
               | of edge learning is generally done with SLAM rather than
               | neural networks. There might be other applications for
               | edge learning, of course.
        
           | epmaybe wrote:
           | This is a bit off topic, but I'm constantly looking at ways
           | to efficiently stream 4K cameras live to local displays as
           | well as remote displays at the highest framerate and
           | resolution possible. How feasible would it be on the xavier
           | to stream 2 4k cameras and display them on at least 2 4k
           | screens? Extra points if you could do that and simultaneously
           | upload to a streaming service, such as twitch.
        
             | TaylorAlexander wrote:
             | You can certainly do this using machine vision cameras.
             | Either USB3, Gig Ethernet, or CSI interface (16cm max run
             | length I believe). I forget how best to attach two displays
             | to the Xavier but that's seems doable.
             | 
             | I got my cameras from e-consystems and they've got some
             | USB3 cameras that could do it. At least I'm pretty sure. My
             | USB3 cameras just showed up and I haven't tried them yet.
        
               | epmaybe wrote:
               | I actually have two econ systems cameras that I've been
               | testing on my desktop/laptop, and it does all right but
               | even then struggles at 4k resolutions.
        
             | 01100011 wrote:
             | Best bet is to take the raw video and run it through
             | gstreamer. You should be able to setup a pipeline which
             | displays the raw video locally while sending a compressed
             | stream to the network. I'd bet that Nvidia has gstreamer
             | modules which make use of their compression HW, so it might
             | be possible. To be honest though, that's a lot of data, so
             | I don't know how well dual 4k would work. You can always
             | scale it down in gstreamer before you send it to the
             | compression module.
             | 
             | You'll probably want to use the CSI-2 interfaces to connect
             | the cameras, but that depends. CSI-2 was developed for cell
             | phones and is hard to run over long distances. It's
             | optimized for low-power and designed for very short
             | interconnects. We had a ton of problems using it at the
             | last company I worked for. I really wish there was a
             | competing standard for embedded cameras.
        
             | m463 wrote:
             | I've wondered about this too.
             | 
             | I think the magic camera interconnect is CSI/CSI2 and it's
             | not really flexible enough. You either have really short
             | copper interconnects, or unavailable fiber interconnects.
             | 
             | What would be cool is if csi to ethernet were a thing.
             | either low latency put-it-on-the-wire or compressed. I
             | don't know, maybe it is. But make it a standard like rca
             | jacks.
        
               | manofmanysmiles wrote:
               | You can buy kits that send video over coax including
               | power for somewhat reasonable prices:
               | 
               | https://leopardimaging.com/product/nvidia-jetson-
               | cameras/nvi...
               | 
               | I haven't tried them, but am considered them for project.
        
               | achuwilson wrote:
               | It is possible to extend CSI through good HDMI cables
               | https://www.robotshop.com/en/arducam-csi-hdmi-cable-
               | extensio...
        
         | corysama wrote:
         | The network is very, very unreliable at the edge. Better to
         | have each piece work independently and store up processed
         | results to transmit eventually, opportunistically. If that
         | processing involves real time video processing there's no way
         | you're going to get that done over a reliably unreliable
         | connection.
        
         | fvv wrote:
         | I mean it could be way more powerful like stadia on phone vs
         | using phone gpu , latency is not too high for the described
         | usage .. imo just automotive may require a dedicated brain on
         | the edge, I'm totally wrong ?
        
           | mebr wrote:
           | I work at a startup that uses edge AI. There are many factors
           | that edge is preferred over cloud. Security is one. Latency
           | is important in many cases. If the internet connection is
           | another dependency for a critical system, it can be a big
           | headache. Once you start working on a real-world project you
           | run into these issues. In return you give up monitoring the
           | data and model that can be done with cloud deployment.
        
         | halotrope wrote:
         | Cloud GPU/TPU resources are still somewhat expensive. Also
         | bandwidth can be an issue when you would first need to feed
         | video through potentially metered connections. Last but not
         | least latency can be an issue for e.g robotics and automotive.
        
         | fluffything wrote:
         | Not a dumb question at all: data traffic is expensive.
         | 
         | If you have thousands of remote sensors collecting Gbs and GBs
         | of real time data, for ~1000$ you can add a "streaming"
         | supercomputer to your sensor to analyze the data in place and
         | save on network and storage costs.
         | 
         | Notice however that the announcement is for an Nvidia AGX
         | product, which is for autonomous machines. The Nvidia "edge"
         | products for processing data on the sensors are called Nvidia
         | EGX.
         | 
         | For autonomous machines, you often need to analyze the data in
         | the machine anyways, e.g., you don't want a drone falling if it
         | looses network connectivity.
        
         | throwlaplace wrote:
         | latency
        
         | bitwize wrote:
         | Why do humans carry large, energy-hungry brains around as
         | opposed to being simple tools of the Hivemind like their
         | brethren the insects?
         | 
         | Making the edges smarter allows them to react and adapt on
         | smaller timescales.
        
       ___________________________________________________________________
       (page generated 2020-05-03 23:00 UTC)