[HN Gopher] Nvidia AGX Xavier Developer Kit: Refreshed with 32GB... ___________________________________________________________________ Nvidia AGX Xavier Developer Kit: Refreshed with 32GB of RAM, for $699 Author : my123 Score : 88 points Date : 2020-05-03 18:48 UTC (4 hours ago) (HTM) web link (www.nvidia.com) (TXT) w3m dump (www.nvidia.com) | king_magic wrote: | My biggest complaint with the Jetson line is it's all ARM. Look, | I get it. But the developer experience is horrible. Building | Docker containers for ARM devices is a pain. Hell, building | anything for a Jetson can be a pain unless it's a pre-packaged | NVIDIA thing - really not a fan of building things from source. | Add on top of that NVIDIA's very low level documentation for | pretty much any tooling they ship, coupled with the difficulty in | getting near-time engineering support (unless you want to post to | one of their message boards, and hope you get an answer back in | less than a week)... basically, it's really rough to do anything | seriously useful with Jetson hardware. | | Second biggest complaint is deploying Jetsons in production | environments. Dev kits aren't production stable, so you either | need to build your own carrier board or find one pre-built, and | frankly that's just a giant pain to do. | | Third biggest complaint is having to flash Jetsons manually. | Misery. | | A production-ready x64 Jetson that you could order directly from | NVIDIA would be my dream. Add up all of the shortcomings and | overhead of ARM Jetsons and IMO you do not have a viable device | for shipping AI solutions at scale. | choppaface wrote: | The dev support is also bottom-of-the-barrel even if you're a | high-margin cloud customer. For a generous upper bound of what | Nvidia considers "software support," look at TensorRT, where a | majority of the useful stuff has either been written by third | parties or scoped out the hard way by people trying to use it. | Nvidia isn't really a software company, and their core product | has a very narrow user interface. These factors hamper whatever | you can get out of a Jepsen. | 0x8BADF00D wrote: | > Building Docker containers for ARM devices is a pain in the | ass. | | It's not so bad, you just need a beefy ARM machine to build the | containers in CI. It would be silly to build a Docker container | on the Jetson itself. You would never use an embedded device | for compiles and builds, why would you build Docker containers | on one? | linarism wrote: | Genuinely curious, what is an example of a beefy ARM machine? | pram wrote: | Anything with a ThunderX processor is maximum beefy. You | can get on-demand servers like that from places like | Packet. AWS also has their own A1 instances with lower core | counts. These would all be good for cross compiling/builds. | | Comedy answer: iPad Pro | p1necone wrote: | > Comedy answer: iPad Pro | | I mean, the iPad Pro _does_ have a relatively beefy | processor. If only you could run arbitrary code on it. | BillinghamJ wrote: | A lot of instruction sets aren't there yet, but | https://ish.app is doing a truly incredible job in this | regard. | | Gives you a working Alpine Linux installation which you | can download and install packages for normally, all | within the bounds of the normal Apple sandbox, with | decent enough performance. | | It doesn't have SSE or MMX yet, so eg Go and Node aren't | usable at this point. But a shocking amount actually does | work perfectly, so it's only a matter of time as more | instruction sets are implemented. | king_magic wrote: | Spinning up an ARM machine to build a container that needs to | be shipped over the network to a Jetson is a pain when | developing. | threeseed wrote: | You could spin up an AWS ARM instance in about 30s. | | Or just have Jenkins etc trigger the launch of one when | your build job needs it. | 0x8BADF00D wrote: | Welcome to embedded computing :-) | king_magic wrote: | I get it, but I also simply don't want to deal with it at | an embedded level. Which is why I vote with my wallet and | choose x64 hardware with Quadro GPUs to ship for | production. | | More expensive, more power consuming? Sure. More sanity? | Massively better dev experience? Massively better | production/ops experience? Absolutely. | arthurcolle wrote: | Why is it a pain to build Docker container for the ARM | architecture? Just curious, haven't had to do it myself. | king_magic wrote: | If your dependencies don't have ARM builds, you need to build | them from source. Which is fine, unless you get unlucky and | they don't build. Which happens way more often than I'd like. | my123 wrote: | For a moddable Arm platform with all batteries included, there | isn't really an alternative to this, especially at $699. | | It's much stronger than an RPi and could fill the gap between RPi | and Arm-based server platforms. | mappu wrote: | In terms of using it as an ARMv8 desktop workstation (with | decent CPU performance, real SATA / Ethernet / PCI-e | connectors) - some other contenders include the MACCHIATObin | (quad A72) and Honeycomb LX2K (16-core A72, 750USD) from Solid- | Run. | farseer wrote: | Honest question, can something like this kill the market for | embedded DSP processors made by Texas Instruments or Analog | Devices? | qppo wrote: | No, those face more threat from ARM M4/7 cores or RISC-V units | with vector/SIMD coprocessors. DSP cores are being lifted out | of discrete chips and placed as IP blocks into more integrated | solutions, with the really complex algo stuff placed in general | purpose hardware like FPGAs. | | The economics don't really make sense for TI/ADI DSPs imo. If | you had an application where you needed a chip just to do DSP | you'd probably use an ARM core instead - but the applications | engineers at TI/ADI will gladly help you find a product in | their catalog that has more features integrated into it (like | ADC/DAC, even analog front ends for audio/RF, USB/Bluetooth | stacks) for your product. | | Basically there's no market to kill, from what I've seen. | Cerium wrote: | I'm seeing lots of Analog Devices DSPs replaced by Tegras and | Zync MPSoCs. | augustt wrote: | I mean, this is absurdly more powerful than those dedicated | DSPs. | WJW wrote: | It's also easily 10x the price. It really matters what the | application is and how much processing power you need. | smg wrote: | The cheapest Volta GPUs I have seen so far cost over 2K for 12GB. | Can the GPU provided in this kit be used for training? | fizixer wrote: | Nope, the use of the words 'edge' and 'inference' in the tag- | line pretty much mean there is no learning, no training. | wmf wrote: | Does it have "fake" tensor cores? Aren't those for training? | fizixer wrote: | You still need tensor cores for inference. But they don't | do weight updates. Learning/training is all about updating | the weights (through backpropagation or whatever). | | So another way to put it: its tensor cores do feed-forward | calculations, but no backpropagation, and no weight | updates. | rrss wrote: | The hardware and platform is capable of training just | fine. It's just rarely done because it is slower than | training on pretty much any discrete GPU. | dchichkov wrote: | Yes, if your model is small enough or, if you are fine-tuning | small number of layers. TendorFlow 1.15 and 2.0 are available | on Xavier. I understand that PyTorch could be built as well. | | Nite that the number of CUDA kernels and amount of memory | available is smaller, if compared to descrete Volta GPUs. | fizixer wrote: | You say it can do training for small models because of the | presence of the small (512-core) GPU? (plus maybe some left- | over, control calculations by the CPU) | corysama wrote: | It's a low-wattage device. It's performance can't hold a | candle to a last-gen card that uses 10X the power. | fizixer wrote: | Inference only. (So this is competing with Google TPUv1; a few | years late and way more expensive, but with more memory) | my123 wrote: | It can do training with its GPU, not the fastest thing in the | world though. | wmf wrote: | You can't put a TPUv1 in a car because Google doesn't sell | them. | rrss wrote: | 1. This isn't inference only, it has the full capabilities of a | normal GPU, just small and low power (and therefore much slower | than normal GPUs). | | 2. TPUv1 is a matrix multiply ASIC that requires a host CPU to | do anything. This thing is a SoC that includes both a CPU and a | GPU. The CPU is pretty fast for what it is - much faster than | e.g. raspberry pi, see https://www.phoronix.com/scan.php?page=a | rticle&item=nvidia-j.... | | 3. not sure how you know whether this is more expensive than a | TPUv1, since the TPUv1 was never sold or available outside of | google. | | A much better comparison would be between this and the Edge TPU | development board. | snek wrote: | The lineup continues to balloon in price. A lot of students would | the TK and TX models for robotics and whatnot, since they were | only 200-300 bucks. | shmolyneaux wrote: | The Jetson Nano[0] seems like it would be better for students, | it's only $99. | | [0]: https://developer.nvidia.com/embedded/jetson-nano- | developer-... | TaylorAlexander wrote: | I have one of these powering my open source four wheel drive | robot. [1] | | I've started doing machine learning experiments with it finally. | (See [1] for details) | | There's a few tricks to getting the best performance. You want to | convert your neural network to run with NVIDIA's TensorRT library | instead of just tensorflow or torch. TensorRT does all the | optimized goodness that gets you the most out of the hardware. | Not all possible network operations can run in TensorRT (though | nvidia updates the framework regularly). This means some networks | can't be easily converted to something fully optimized for this | platform. Facebook's detectron2 for example uses some operations | that don't readily convert. [2] | | But then if you're new like me you've got to both find some code | that will ultimately produce something you can convert to | TensorRT, and you also need something that you can easily train. | I've learned that training using your own dataset is often non- | obvious. A lot of example code shows how to use an existing | dataset but they totally gloss over the specific label format | those datasets use. That means you've got to do some digging to | figure out how to make your own dataset load properly in to the | training code. | | After trying a few different things, I've gotten some good | results training using Bonnet(al) [3]. I was able to make enough | sense of its training code to use my own dataset, and it looks | like it will readily convert to TensorRT. Then you load the | converted network using NVIDIA's Deepstream library for maximum | pipeline efficiency [4]. | | The performance numbers for the AGX Xavier are very good, and I | am hopeful I will get my application fully operation soon enough. | | [1] https://reboot.love/t/new-cameras-on-rover/ | | [2] https://github.com/facebookresearch/detectron2/issues/192 | | [3] https://github.com/PRBonn/bonnetal | | [4] https://developer.nvidia.com/deepstream-sdk | weinzierl wrote: | They seem to offer a cheaper 8 GB model too but unfortunately I | see no price for it. I'm curious how much it'll be because, as | much as I'd like to toy around with this, the $699 is a little to | much for just experimentation. | | EDIT: The 8GB _Module_ seems to be $679 here[1]. This makes the | $699 or the 32 GB _Developer Kit_ seem like a steal. Still, too | expensive for play, I guess I 'll stick with my Jetson Nanos for | a while... | | [1] https://www.arrow.com/en/products/900-82888-0060-000/nvidia | my123 wrote: | I see that AGX Xavier devkit as much more of an Arm desktop | platform, which it is very suitable for too. | fluffything wrote: | There is also the Jetson Nanokit which costs ~120 EUR. | fvv wrote: | I'm definitely not expert and probably this is a dumb question , | but why smart edge things like smart robot and not dumb edge with | smart central brain ? Anyway data are useful aggregated | central;ly why not incorporate the brain centrally too? | dejv wrote: | For things like industrial robots or UAVs latency is biggest | problem. | | I've worked on fruit sorting machine and there was about 20ms | to make decision if the object passed or not + there was | continuous streams of 10000s of objects per second to classify. | The computer vision/classifier had to be both fast and reliable | about spitting the answers, which was actually more important | than precission of classifier itself. | TaylorAlexander wrote: | Well, first some clarification - "edge" means "on robot" versus | something in the cloud. And the reason you do this is latency | and connectivity. | | I am designing a four wheel drive robot using the NVIDIA AGX | Xavier [1] that will follow trails on its own or follow the | operator on trails. You don't want your robot to lose cellular | coverage and become useless. Even if you had coverage, there | would be significant data usage as Rover uses four 4k cameras, | which is about 30 megapixels (actually they max out at 13mp | each or 52mp total). Constantly streaming that to the cloud | would be very expensive on a metered internet connection. Even | on a direct line the machine would saturate many broadband | connections. Of course you can selectively stream but this | makes things more complicated. | | Latency is an issue. Imagine a self driving car that required a | cloud connection. It's approaching an intersection and someone | on a bicycle falls over near its path. Better send that sensor | data to the cloud fast to determine how to act! | | On my Rover robot it streams the cameras directly in to the GPU | memory where it can be processed using ML without ever being | copied through the CPU. It's super low latency and allows for | robots that respond rapidly to their environment. Imagine | trying to make a ping-pong playing robot with a cloud | connection. | | I am also designing a farming robot. [2] We don't expect any | internet connection on farms! | | [1] https://reboot.love/t/new-cameras-on-rover/ [2] | https://www.twistedfields.com/technology | | Edit: Don't forget security! Streaming high resolution sensors | over the cloud is a security nightmare. | fizixer wrote: | Edge means on-premise (on robot) as you said. | | But 'edge,' as used in context of AI, is also a wink-and-a- | nod that the device is inference-only (no learning, no | training). The term "inference only" doesn't sound very | marketing-friendly. | my123 wrote: | AGX Xavier can do training on device just fine - and run | every CUDA workload. It's just not the fastest device at | that, you'd prefer a desktop GPU if you can for such a | purpose. | michaelt wrote: | I assume what fizixer means is, if you're making an | Amazon-Alexa-type-thing, training 1 model on 1 million | user's data will work better than 1 million models | trained on 1 user's data each. | | AFAIK the "Roomba learns the layout of your house" type | of edge learning is generally done with SLAM rather than | neural networks. There might be other applications for | edge learning, of course. | epmaybe wrote: | This is a bit off topic, but I'm constantly looking at ways | to efficiently stream 4K cameras live to local displays as | well as remote displays at the highest framerate and | resolution possible. How feasible would it be on the xavier | to stream 2 4k cameras and display them on at least 2 4k | screens? Extra points if you could do that and simultaneously | upload to a streaming service, such as twitch. | TaylorAlexander wrote: | You can certainly do this using machine vision cameras. | Either USB3, Gig Ethernet, or CSI interface (16cm max run | length I believe). I forget how best to attach two displays | to the Xavier but that's seems doable. | | I got my cameras from e-consystems and they've got some | USB3 cameras that could do it. At least I'm pretty sure. My | USB3 cameras just showed up and I haven't tried them yet. | epmaybe wrote: | I actually have two econ systems cameras that I've been | testing on my desktop/laptop, and it does all right but | even then struggles at 4k resolutions. | 01100011 wrote: | Best bet is to take the raw video and run it through | gstreamer. You should be able to setup a pipeline which | displays the raw video locally while sending a compressed | stream to the network. I'd bet that Nvidia has gstreamer | modules which make use of their compression HW, so it might | be possible. To be honest though, that's a lot of data, so | I don't know how well dual 4k would work. You can always | scale it down in gstreamer before you send it to the | compression module. | | You'll probably want to use the CSI-2 interfaces to connect | the cameras, but that depends. CSI-2 was developed for cell | phones and is hard to run over long distances. It's | optimized for low-power and designed for very short | interconnects. We had a ton of problems using it at the | last company I worked for. I really wish there was a | competing standard for embedded cameras. | m463 wrote: | I've wondered about this too. | | I think the magic camera interconnect is CSI/CSI2 and it's | not really flexible enough. You either have really short | copper interconnects, or unavailable fiber interconnects. | | What would be cool is if csi to ethernet were a thing. | either low latency put-it-on-the-wire or compressed. I | don't know, maybe it is. But make it a standard like rca | jacks. | manofmanysmiles wrote: | You can buy kits that send video over coax including | power for somewhat reasonable prices: | | https://leopardimaging.com/product/nvidia-jetson- | cameras/nvi... | | I haven't tried them, but am considered them for project. | achuwilson wrote: | It is possible to extend CSI through good HDMI cables | https://www.robotshop.com/en/arducam-csi-hdmi-cable- | extensio... | corysama wrote: | The network is very, very unreliable at the edge. Better to | have each piece work independently and store up processed | results to transmit eventually, opportunistically. If that | processing involves real time video processing there's no way | you're going to get that done over a reliably unreliable | connection. | fvv wrote: | I mean it could be way more powerful like stadia on phone vs | using phone gpu , latency is not too high for the described | usage .. imo just automotive may require a dedicated brain on | the edge, I'm totally wrong ? | mebr wrote: | I work at a startup that uses edge AI. There are many factors | that edge is preferred over cloud. Security is one. Latency | is important in many cases. If the internet connection is | another dependency for a critical system, it can be a big | headache. Once you start working on a real-world project you | run into these issues. In return you give up monitoring the | data and model that can be done with cloud deployment. | halotrope wrote: | Cloud GPU/TPU resources are still somewhat expensive. Also | bandwidth can be an issue when you would first need to feed | video through potentially metered connections. Last but not | least latency can be an issue for e.g robotics and automotive. | fluffything wrote: | Not a dumb question at all: data traffic is expensive. | | If you have thousands of remote sensors collecting Gbs and GBs | of real time data, for ~1000$ you can add a "streaming" | supercomputer to your sensor to analyze the data in place and | save on network and storage costs. | | Notice however that the announcement is for an Nvidia AGX | product, which is for autonomous machines. The Nvidia "edge" | products for processing data on the sensors are called Nvidia | EGX. | | For autonomous machines, you often need to analyze the data in | the machine anyways, e.g., you don't want a drone falling if it | looses network connectivity. | throwlaplace wrote: | latency | bitwize wrote: | Why do humans carry large, energy-hungry brains around as | opposed to being simple tools of the Hivemind like their | brethren the insects? | | Making the edges smarter allows them to react and adapt on | smaller timescales. ___________________________________________________________________ (page generated 2020-05-03 23:00 UTC)