[HN Gopher] eBPF will help solve service mesh by getting rid of ...
       ___________________________________________________________________
        
       eBPF will help solve service mesh by getting rid of sidecars
        
       Author : tgraf
       Score  : 203 points
       Date   : 2021-12-09 13:02 UTC (9 hours ago)
        
 (HTM) web link (isovalent.com)
 (TXT) w3m dump (isovalent.com)
        
       | unmole wrote:
       | Offtopic: I really like the style of the diagrams. I remember
       | seeing something similar elsewhere. Are this manually drawn or is
       | this the result of some tool?
        
         | tgraf wrote:
         | OP here: It's whimsical.com. I really love it.
        
           | unmole wrote:
           | Thank you, Thomas! I really admire all that you have done
           | with Cilium.
        
       | manvendrasingh wrote:
       | I am wondering how would this solve the problem of mTLS while
       | still supporting service level identities? Is it possible to move
       | the mTLS to listeners instead of sidecar or some other mechanism?
        
       | zinclozenge wrote:
       | It's not clear how eBPF will deal with mTLS. I actually asked
       | that when interviewing at a company using eBPF for observability
       | into Kubernetes the answer was they didn't know.
       | 
       | Yea, if you're getting TLS termination at the load balancer prior
       | to k8s ingress then it's pretty nice.
        
         | GauntletWizard wrote:
         | The answer to this is simple - TLS will start being terminated
         | at the pods themselves. The frontend load balancer will _also_
         | terminate TLS - to the public sphere, and then will
         | authenticate it 's connection to your backends as well.
         | Kubernetes will provide x509 certificates suitable for service-
         | to-service communications to pods automatically.
         | 
         | The work is still in the early phases, so the exact form this
         | will take has yet to be hammered out, but there's broad
         | agreement that this functionality will be first-class in k8s in
         | the future. If you want to keep running proxies for the other
         | feature they provide, great - They'll be able to use the
         | certificates provided by k8s for identity. If you'd like to
         | know more, come to on of the SIGAUTH Meetings :)
        
         | tgraf wrote:
         | Then you should interview again but with us.
         | 
         | This is not too different from wpa_supplicant used by several
         | operating for key management for wireless networks. The
         | complicated key negotiation and authentication can remain in
         | user space, the encryption of the negotiated key can be done in
         | the kernel (kTLS) or, when eBPF can control both sides, it can
         | even be done without using TLS but encrypting using a network
         | level encapsulation format to it works for non-TCP as well.
         | 
         | Hint: We are hiring.
        
       | dijit wrote:
       | Honestly after I learned that the majority of Kubernetes nodes
       | just proxy traffic between each other using iptables and that a
       | load balancer can't tell the nodes apart (ones where your app
       | lives vs ones that will proxy connection to your app) I got
       | really worried about any kind of persistent connection in k8s
       | land.
       | 
       | Since some number of persistent connections will get force
       | terminated on scale down or node replacement events...
       | 
       | Cilium and eBPF looks like a pretty good solution to this though
       | since you can then advertise your pods directly on the network
       | and load balance those instead of every node.
        
         | p_l wrote:
         | Whether load balancer can or can-not tell the nodes apart
         | depends on load balancer and method you use to expose your
         | service to it, as well as what kind of networking setup you use
         | (i.e. is pod networking sensibly exposed to load balancer or
         | ... weirdly)
         | 
         | Each "Service" object provides (by default, can be disabled)
         | load-balanced IP address that by default uses kube-proxy as you
         | described, a DNS A record pointing to said address, DNS SRV
         | records pointing to actual direct connections (whether
         | NodePorts or PodIP/port combinations) plus API access to get
         | the same data out.
         | 
         | There are even replacement kube-proxy implementations that
         | route everything through F5 load balancer boxes, but they are
         | less known.
        
         | q3k wrote:
         | > Honestly after I learned that the majority of Kubernetes
         | nodes just proxy traffic between each other using iptables and
         | that a load balancer can't tell the nodes apart (ones where
         | your app lives vs ones that will proxy connection to your app)
         | I got really worried about any kind of persistent connection in
         | k8s land.
         | 
         | There can be a difference, if your LoadBalancer-type service
         | integration is well implemented. The externalTrafficPolicy knob
         | determines whether all nodes should attract traffic from
         | outside or only nodes that contain pods backing this service.
         | For example, metallb (which attracts traffic by /32 BGP
         | announcements to given external peers) will do this correctly.
         | 
         | Within the cluster itself, only nodes which have pods backing a
         | given service will be part of the iptables/ipvs/...
         | Pod->Service->Pod mesh, so you won't end up with scenic routes
         | anyway. Same for Pod->Pod networking, as these addresses are
         | already clustered by host node.
        
           | kklimonda wrote:
           | How do you keep ecmp hashing stable between rollouts?
        
             | bogomipz wrote:
             | ECMP hashing would be between the edge router and the IP of
             | the LBs advertising VIPs no? The LB would maintain the
             | mappings between the VIPs and the nodePort IPs of worker
             | nodes that have a local service Endpoint for the requested
             | service. I don't think this would be any different than it
             | is without Kubernetes or am I completely misunderstanding
             | your question?
        
             | dharmab wrote:
             | If you're asking about connection stability in general:
             | 
             | - Ideally, you avoid it in your application design.
             | 
             | - If you need it, you set up SIGTERM handling in the
             | application to wait for all connections to close before the
             | process exits. You also set up "connection draining" at the
             | load balancer to keep existing sessions to terminating Pods
             | open but send new sessions to the new Pods. The tradeoff is
             | that rollouts take much longer- if the session time is
             | unbounded, you may need to enforce a deadline to break
             | connections eventua.
        
               | dilyevsky wrote:
               | You dont just wait until all connections exit, you first
               | need to withdraw bgp announcement to the edge router,
               | then start the wait. It's not that simple with metal LBs.
               | On the other hand it's not that simple with cloud LBs
               | either bc they also break long tcp streams when they
               | please
        
         | dharmab wrote:
         | That's if you're using a NodePort service, which the
         | documentation explains is for niche use cases such as if you
         | don't have a compatible dedicated load balancer. In most
         | professional setups you do have such a load balancer and can
         | use other types of routing that avoid this.
         | 
         | https://kubernetes.io/docs/concepts/services-networking/serv...
        
           | topspin wrote:
           | > In most professional setups you do have such a load
           | balancer
           | 
           | May I ask what one might use in an AWS cloud environment to
           | provide that load balancer within a Region?
           | 
           | Does IPv6 address any of these issues? It seems to me that
           | IPv6 is capable of providing every component in the system
           | its own globally routable address, identity (mTLS perhaps)
           | and transparent encryption with no extra sidecars, eBPF
           | pieces, etc.
        
             | shosti wrote:
             | Ingresses on EKS will set up an ALB that sends traffic
             | directly to pods instead of nodes (basically skips the
             | whole K8s Service/NodePort networking setup). You have to
             | use ` alb.ingress.kubernetes.io/target-type: ip` as an
             | annotation I think (see
             | https://docs.aws.amazon.com/eks/latest/userguide/alb-
             | ingress...).
        
             | [deleted]
        
             | dharmab wrote:
             | > May I ask what one might use in an AWS cloud environment
             | to provide that load balancer within a Region?
             | 
             | The AWS cloud controller will automatically set up an ALB
             | for you if you configure a LoadBalancer service in
             | Kubernetes. I've also done custom setups with AWS NLBs.
             | 
             | > Does IPv6 address any of these issues?
             | 
             | It could address some issues- you could conceivably create
             | a CNI plugin which allocates an externally addressable IP
             | to your Pods. Although you would probably still want a load
             | balancer for custom routing rules and the improved
             | reliability over DNS round robin.
        
         | pm90 wrote:
         | This is a concern only if you have ungraceful node termination
         | Ie you suddenly yoink the node. In most cases when you
         | terminate the node, k8s will (attempt to) cordon and drain the
         | nodes, letting the pods gracefully terminate the connections
         | before getting evicted.
         | 
         | If you didn't have k8s and just used an autoscaling group of
         | VMs you would have the same issue...
        
       | zdw wrote:
       | So instead of making the applications use a good RPC library,
       | we're going to shove more crap into the kernel? No thanks, from a
       | security context and complexity perspective.
       | 
       | Per https://blog.dave.tf/post/new-kubernetes/ , the way that this
       | was solved in Borg was:
       | 
       | > "Borg solves that complexity by fiat, decreeing that Thou Shalt
       | Use Our Client Libraries For Everything, so there's an obvious
       | point at which to plug in arbitrarily fancy service discovery and
       | load-balancing. "
       | 
       | Which seems like a better solution, if requiring some
       | reengineering of apps.
        
         | __alexs wrote:
         | I'm sure someone will write leftPad in eBPF any day now.
        
           | hestefisk wrote:
           | Indeed. We could even embed a WASM runtime (headless v8?) so
           | one can execute arbitrary JavaScript in-kernel... wait :)
        
             | zaphar wrote:
             | eBPF is far too limited to run a WASM runtime. That's why
             | the proposed article approach is even possible.
        
         | nonameiguess wrote:
         | In addition to whether or not all of your various dev teams
         | preferred languages have a supported client SDK, you also have
         | the build vs. buy issue if you're plugging COTS applications
         | into your service mesh, there is no way to force a third party
         | vendor to reengineer their application specifically for you.
         | 
         | This probably dictates a lot of Google's famous "not invented
         | here" behavior, but most organizations can't afford to just
         | write their entire toolchain from scratch and need to use
         | applications developed by third parties.
        
         | tptacek wrote:
         | The complexity is an issue (but sidecars are plenty complex
         | too), but the security not so much. BPF C is incredibly
         | limiting (you can't even have loops if the verifier can't prove
         | to its satisfaction that the loop has a low static bound). It's
         | nothing at all like writing kernel C.
        
           | the_duke wrote:
           | You don't have to use C.
           | 
           | There are two projects that enable writing eBPF with Rust
           | [1][2]. I'm sure there is an equivalent with nicer wrappers
           | for C++.
           | 
           | [1] https://github.com/foniod/redbpf
           | 
           | [2] https://github.com/aya-rs/aya
        
             | tptacek wrote:
             | It doesn't make any difference which language you use; the
             | security promises are coming from the verifier, which is
             | analyzing the CFG of the compiled program. C is what most
             | people use, since the underlying APIs are in C, and since
             | the verifier is so limiting that most high-level
             | constructions are off the table.
        
               | the_duke wrote:
               | Sure, I was not implying that Rust would have any
               | security benefits fir eBPF.
               | 
               | Just that you can even write eBPF code in more convenient
               | languages.
        
               | tptacek wrote:
               | This has come up here a bunch of times (we do a lot of
               | work in Rust). I've been a little skeptical that Rust is
               | a win here, for basically the reason I gave upthread: you
               | can't really do much with Rust in eBPF, because the
               | verifier won't let you; it seems to me like you'd be
               | writing a dialect of Rust-shaped C. But we did a recent
               | work sample challenge for Rust candidates that included
               | an eBPF component, and a couple good submissions used
               | Rust eBPF, so maybe I'm wrong about that.
               | 
               | I'm also biased because I _love_ writing C code (I know,
               | both viscerally and intellectually, that I should
               | virtually never do so; eBPF is the one sane exception!)
        
         | MayeulC wrote:
         | > a good RPC library
         | 
         | I like that approach. If you use client libraries, new RPC
         | mechanisms are "free" to implement (until you need to
         | troubleshoot upgrades). It's also an argument against
         | statically linking.
         | 
         | For instance, if running services on the same machine, io-uring
         | can probably be used? (I'm a noob at this). eBPF for packet
         | switching/forwarding between different hosts, etc.
        
           | malkia wrote:
           | This may no longer be the case, but back at Google I remember
           | one day having my java library no longer using the client
           | library logger, but spawning some other app and talking
           | (sending logs to it). That other app used to be fat-client,
           | linked in our app, supported by another team. First I was
           | wtf.. Then it hit me - this other team can update their
           | "logging" binary at different cycle than us (hence we don't
           | have to be on the same "build" cycle). All they needed to do
           | for us is provide with very "thin" and rarelly changing
           | interface library. And they can write it in any language they
           | like (Java, c++, go, rust, etc.)
           | 
           | Also no need to be .so/ (or .dll/.dylib) - just some quick
           | IPC to send messages around. Actually can be better. For one,
           | if their app is still buffering messages, my app can exit,
           | while theirs still run. Or security reasons (or not having to
           | think about these), etc. etc. So still statically linked but
           | processes talking to each other. (Granted does not always
           | work for some special apps, like audio/video plugins, but I
           | think works fine for the case above).
        
         | jrockway wrote:
         | The big secret is that sidecars can only help so much. If you
         | want distributed tracing, the service mesh can't propagate
         | traces into your application (so if service A calls service B
         | which calls service C, you'll never see that end to end with a
         | mesh of sidecars). mTLS is similar; it's great to encrypt your
         | internal traffic on the wire, but that needs to get propagated
         | up to the application to make internal authorization decisions.
         | (I suppose in some sense I like to make sure that "kubectl
         | port-forward" doesn't have magical enhanced privileges, which
         | it does if your app is oblivious to the mTLS going on in the
         | background. You could disable that specifically in your k8s
         | setup, but generally security through remembering to disable
         | default features seems like a losing battle to me. Easier to
         | have the app say "yeah you need a key". Just make sure you
         | build the feature to let oncall get a key, or they will be very
         | sad.)
         | 
         | For that reason, I really do think that this is a temporary
         | hack while client libraries are brought up to speed in popular
         | languages. It is really easy to sell stuff with "just add
         | another component to your house of cards to get feature X", but
         | eventually it's all too much and you'll have to just edit your
         | code.
         | 
         | I personally don't use service meshes. I have played with Istio
         | but the code is legitimately awful, so the anecdotes of "I've
         | never seen it work" make perfect sense to me. I have, in fact,
         | never seen it work. (Read the xDS spec, then read Istio's
         | implementation. Errors? Just throw them away! That's the core
         | goal of the project, it seems. I wrote my own xDS
         | implementation that ... handles errors and NACKs correctly.
         | Wow, such an engineering marvel and so difficult...)
         | 
         | I do stick Envoy in front of things when it seems appropriate.
         | For example, I'll put Envoy in front of a split
         | frontend/backend application to provide one endpoint that
         | serves both the frontend or backend. That way production is
         | identical to your local development environment, avoiding
         | surprises at the worst possible time. I also put it in front of
         | applications that I don't feel like editing and rebuilding to
         | get metrics and traces.
         | 
         | The one feature that I've been missing from service meshes,
         | Kubernetes networking plugins, etc. is the ability to make all
         | traffic leave the cluster through a single set of services, who
         | can see the cleartext of TLS transactions. (I looked at Istio
         | specifically, because it does have EgressGateways, but it's
         | implemented at the TCP level and not the HTTP level. So you
         | don't see outgoing URLs, just outgoing IP addresses. And if
         | someone is exfiltrating data, you can't log that.) My biggest
         | concern with running things in production is not so much
         | internal security, though that is a big concern, but rather "is
         | my cluster abusing someone else". That's the sort of thing that
         | gets your cloud account shut down without appeal, and I feel
         | like I don't have good tooling to stop that right now.
        
           | darkwater wrote:
           | > If you want distributed tracing, the service mesh can't
           | propagate traces into your application (so if service A calls
           | service B which calls service C, you'll never see that end to
           | end with a mesh of sidecars)
           | 
           | Why not? AFAIK traces are sent from the instrumented app to
           | some tracing backend, and a trace-id is carried over via an
           | HTTP header from the entry point of the request until the
           | last service that takes part in that request. Why a
           | sidecar/mesh would break this?
        
             | afrodc_ wrote:
             | This. Header trace propagation is a godsend.
        
             | colonelxc wrote:
             | I think the point is that the service mesh can't do the
             | work of propagation. It needs the client to grab the input
             | header, and attach it to any outbound requests. From the
             | perspective of the service mesh, the service is handling X
             | requests, and Y requests are being sent outbound. It
             | doesn't know how each outbound request maps to an input.
             | 
             | So now all of the sudden we do need a client library for
             | each service in order to make sure the header is being
             | propagated correctly.
        
         | tgraf wrote:
         | If you are in a position where you can do that then great. Most
         | folks out there are in a position where they need to run
         | arbitrary applications delivered by vendors without an ability
         | to modify them.
         | 
         | The second aspect is that this can get extremely expensive if
         | your applications are written in a wide number of language
         | frameworks. That's obviously different at Google where the
         | number of languages can be restricted and standardized.
         | 
         | But even then, you could also link a TCP library into your app.
         | Why don't you?
        
         | outside1234 wrote:
         | The industry is moving away from the client library approach.
         | This is possible in a place like Google where they force folks
         | to write software in one of four languages (C++, Java, Go,
         | Python) but doesn't scale to a broader ecosystem.
        
           | pjmlp wrote:
           | It sure scales, I am yet to work in organisations where
           | everything goes.
           | 
           | There are a set of sanctioned languages and that is about it.
        
             | jayd16 wrote:
             | The subtle aspect of the comment you're replying to is that
             | _they write everything_.
             | 
             | Hard to cram a new library into some closed source vendor
             | app.
        
               | pjmlp wrote:
               | Depends how it was written and made extensible.
        
         | jayd16 wrote:
         | It does feel a bit like we're trying to monkey patch compiled
         | code but the benefits are pretty clear.
        
           | lamontcg wrote:
           | I would argue pretty strenuously that this is not what is
           | being done.
           | 
           | The sockets layer is becoming a facade which can guarantee
           | additional things to applications which are compiled against
           | it, and you've got dependency injection here so that the
           | application layer can be written agnostically and not care
           | about any of those concerns at all.
        
         | q3k wrote:
         | It is the technically better solution IMO/IME, too.
         | 
         | But that doesn't work when you're trying to sell enterprises
         | the idea of 'just move your workloads to Kubernetes!'. :)
        
         | ZeroCool2u wrote:
         | What if a client library does not yet exist for your language?
        
           | q3k wrote:
           | In a large orga, you limit the languages available for
           | projects to well supported ones internally, ie. to those that
           | are known to have a port of the RPC/metrics/status/discovery
           | library. Also makes it easier to have everything under a
           | single build system, under a single set of code styles, etc.
           | 
           | If some developers want to use some new language, they have
           | to first in put in the effort by a) demonstrating the
           | business case of using a new language and allocating
           | resources to integrate it into the ecosystem b) porting all
           | the shared codebase to that new language.
        
             | ZeroCool2u wrote:
             | Absolutely. I was thinking what if there's a good business
             | reason to use a different language that's not the norm for
             | your org. Then you're stuck with an infra problem
             | preventing you from using the right tool for the job.
             | 
             | Of course, this is the exception to the rule you described
             | well :)
        
               | q3k wrote:
               | I don't think of it as an infra problem, but as an early
               | manifestation of effort that would arise later on,
               | anyway: long-term maintenance of that new language. You
               | need people who know the language to integrate it well
               | with the rest of the codebase, people who can perform
               | maintenance on language-related tasks, people who can
               | train other people on this language, ... These are all
               | problems you'd have later on, but are usually handwaved
               | away as trivial.
               | 
               | Throughout my career nearly every single company I've
               | worked in had That One Codebase written by That One
               | Brilliant Programmer in That One Weird Language that no-
               | one maintains because the original author since left, the
               | language turns out to be dead and because it's extremely
               | expensive to hire or train more people to grok that
               | language just for this project.
        
           | __alexs wrote:
           | There are only 5 languages. JavaScript, C++, Java, Python, C#
           | 
           | This is basically the same set of languages people were
           | writing 20 years ago and will probably be the same set of
           | languages people will write in 20 years from now.
        
             | MayeulC wrote:
             | It really depends on your domain. I haven't seen C# a lot,
             | nor python, in some orgs.
             | 
             | For some (like me), it's more a superset of C, assembly,
             | bash, maybe lisp, python and matlab.
             | 
             | For others, it's going to be JavaScript, PHP, CSS, HTML..
             | 
             | I agree though that a library is usually domain-specific,
             | and that you can probably easily identify the subset of
             | languages that you really need official bindings for
             | (thereby making my comment a bit useless, sorry for the
             | noise).
        
         | dvogel wrote:
         | I'm not necessarily advocating for the approach described in
         | the article but it wouldn't worry me from a security
         | perspective. The security model of eBPF is pretty impressive.
         | The security issues arising from engineers struggling to keep
         | the entire model in their head would concern me though.
        
         | p_l wrote:
         | In a world without (D)COM, I find it's much, much harder to
         | make common base libraries and force people to use them,
         | especially if you can't also force limit the set of toolchains
         | used in the environment.
        
           | outside1234 wrote:
           | The network is the base library - that is the shift you are
           | seeing. You make a call out to a network address with a
           | specific protocol.
           | 
           | Also, as an aside, I think WebAssembly has the potential to
           | shift this back. In a world where libraries and programs are
           | compiled to WebAssembly, it doesn't matter what their source
           | language was, and as such, the client library based approach
           | might swing back into vogue.
        
             | p_l wrote:
             | WASM isn't a valid target for many languages, that's one
             | thing.
             | 
             | Two, the case is about the library to interact with the
             | network, so... There's also implementing the protocols.
        
             | jjtheblunt wrote:
             | > The network is the base library
             | 
             | you remind me of the 20+ years ago Sun Microsystems
             | assertion "The Network IS the Computer".
             | 
             | citation: https://www.networkcomputing.com/cloud-
             | infrastructure/networ...
        
       | codetrotter wrote:
       | > Identity-based Security: Relying on network identifiers to
       | achieve security is no longer sufficient, both the sending and
       | receiving services must be able to authenticate each other based
       | on identities instead of a network identifier.
       | 
       | Kinda semi-offtopic but I am curious to know if anyone has used
       | identity part of a WireGuard setup for this purpose.
       | 
       | So say you have a bunch of machines all connected in a WireGuard
       | VPN. And then instead of your application knowing host names or
       | IP addresses as the primary identifier of other nodes, your
       | application refers to other nodes by their WireGuard public key?
       | 
       | I use WireGuard but haven't tried anything like that. Don't know
       | if it would be possible or sensible. Just thinking and wondering.
        
         | madjam002 wrote:
         | I too am interested in this.
         | 
         | I long for the day where Kubernetes services, virtual machines,
         | dedicated servers and developer machines can all securely talk
         | to eachother in some kind of service mesh, where security and
         | firewalls can be implemented with "tags".
         | 
         | Tailscale seems to be pretty much this, but while it seems
         | great for the dev/user facing side of things (developer machine
         | connectivity), it doesn't seem like it's suited for the service
         | to service communication side? It would be nice to have one
         | unified connectivity solution with identity based security
         | rather than e.g Consul Connect for services, Tailscale /
         | Wireguard for dev machine connectivity, etc.
        
           | starfallg wrote:
           | >I long for the day where Kubernetes services, virtual
           | machines, dedicated servers and developer machines can all
           | securely talk to eachother in some kind of service mesh,
           | where security and firewalls can be implemented with "tags".
           | 
           | That's exactly what Scalable Group Tags (SGTs) are -
           | 
           | https://tools.ietf.org/id/draft-smith-kandula-sxp-07.html
           | 
           | Cisco implements this as a part of TrustSec
        
         | tptacek wrote:
         | We're a global platform that runs an intra-fleet WireGuard
         | mesh, so we have authenticated addressing between nodes; we
         | layer a couple dozen lines of BPF C on top of that to extend
         | the authentication model to customer address prefixes. So,
         | effectively, we're using WireGuard as an identity. In fact: we
         | do so explicitly for peering connections to other services.
         | 
         | So yeah, it's a model that can work. It's straightforward for
         | us because we have a lot of granular control over what can get
         | addressed where. It might be trickier if your network model is
         | chaotic.
        
         | tgraf wrote:
         | One of the methods that Cilium (which implements this eBPF-
         | based service mesh idea) uses to implementation authentication
         | between workloads is Wireguard. It does exactly what you
         | describe above.
         | 
         | In addition it can also be used to enforce based on service
         | specific keys/certificates as well.
        
           | allset_ wrote:
           | Isn't the Wireguard implementation in Cilium between nodes
           | only, not workloads (pods)?
        
             | tgraf wrote:
             | It can do both. It can authenticate and encrypt all traffic
             | between nodes which then also encrypts all traffic between
             | the pods running on those pods. This is great because it
             | also covers pod to node and all control plane traffic. The
             | encryption can also use specific keys for different
             | services to authenticate and encrypt pod to pod
             | individually.
        
         | q3k wrote:
         | You'd be adding a whole new layer of what would effectively be
         | dynamic routing. It's doable, but it's not a trivial amount of
         | effort. Especially if you want everything to be transparent and
         | automagic.
         | 
         | There's earlier projects like CJDNS which provide pubkey-
         | addressed networking, but they're limited in usability as they
         | route based on a DHT.
        
       | outside1234 wrote:
       | There is a good talk about this (and more) from KubeCon:
       | 
       | https://www.youtube.com/watch?v=KY5qujcujfI
        
       | davewritescode wrote:
       | From a resource perspective this makes sense but from a security
       | perspective this drives me a little bit crazy. Sidecars aren't
       | just for managing traffic, they're also a good way to automate
       | managing the security context of the pod itself.
       | 
       | The current security model in Istio delivers a pod specific
       | SPIFFE cert to only that pod and pod identity is conveyed via
       | that certificate.
       | 
       | That feels like a whole bunch of eggs in 1 basket.
        
         | tgraf wrote:
         | What the proposed architecture allows is to continue using
         | SPIFFE or another certificate management solution to generate
         | and distribute the certificates but use either a per-node proxy
         | or an eBPF implementation to enforce it. Even if the
         | authentication handshake remains in a proxy but data encryption
         | moves to the kernel then that is a massive benefit from an
         | overhead perspective. This already exists and is called kTLS.
        
       | xmodem wrote:
       | Doing this with eBPF is definitely an improvement, but when I
       | look at some of the sidecars we run in production, I often wonder
       | why we can't just... integrate them into the application.
        
         | mixedCase wrote:
         | There are good reasons more often than not.
         | 
         | Being able to pick up something generic rather than something
         | language-specific.
         | 
         | Not having to do process supervision (which includes handling
         | monitoring and logs) within your application.
         | 
         | Not making the application lifecycle subservient to needs such
         | as log shipping and request rerouting. People get sig traps
         | wrong suprisingly often.
        
           | taeric wrote:
           | My gut is that using sidecars doesn't really solve these
           | problems straight up. Just moved them to the orchestrator.
           | 
           | Which is not bad. But that area is also often misconfigured
           | for supervision. And trapping signals remains mostly broken
           | in all sidecars.
        
         | cfors wrote:
         | You can! There are downsides though for any sufficiently
         | polyglot organization, which is maintaining all the different
         | client SDK's that need to use that.
         | 
         | Sidecars are often useful for platform-centric teams that would
         | like to have access to help manage something like secrets,
         | mTLS, or traffic shaping in the case of Envoy. The team that's
         | responsible for that just needs to maintain a single sidecar
         | rather than all of the potential SDK's for teams.
         | 
         | Especially if you have specific sidecars that only work on a
         | specific infrastructure, for example if you have a Vault
         | sidecar that deals with secrets for your service over EKS IAM
         | permissions, you suddenly can't start your service without a
         | decent amount of mocking and feature flags. Its nice to not
         | have to burden your client code with all of that.
         | 
         | Also, there is a decent amount of work being done on gRPC to
         | speak XDS which also removes the need for the sidecar [0].
         | 
         | [0] https://istio.io/latest/blog/2021/proxyless-grpc/
        
           | xemdetia wrote:
           | Another thing too is that if your main application artifact
           | can be static while your sidecar can react to configuration
           | changes/patches/vulns/updates. Depending on your architecture
           | it can make some components last for years without a change
           | even though the sidecar/surrounding configuration is doing
           | all sorts of stuff. Back when more people ran Java
           | environments there were all sorts of settings you can do with
           | just the JVM without the bytecode moving for how JCE worked
           | which was extraordinarily helpful.
           | 
           | It depends on your environment and architecture combined with
           | how fast you can move especially with third party components.
           | Having the microservice be 'dumb' can save everything.
        
           | pjmlp wrote:
           | For a moment I thought you're talking about POSIX directory
           | services.
        
           | darkwater wrote:
           | > Especially if you have specific sidecars that only work on
           | a specific infrastructure, for example if you have a Vault
           | sidecar that deals with secrets for your service over EKS IAM
           | permissions, you suddenly can't start your service without a
           | decent amount of mocking and feature flags. Its nice to not
           | have to burden your client code with all of that.
           | 
           | Could you please elaborate on this? I don't fully understand
           | what you mean. Especially, I don't understand if "Its nice to
           | not have to burden your client code with all of that" applies
           | to a setup with or without sidecars.
        
             | cfors wrote:
             | Take vault for example. Rather than have to toggle a flag
             | in your service to get a secret, you could have the vault
             | sidecar inject the secret automatically into your
             | container, as opposed to having to pass a configuration
             | flag `USE_VAULT` to your application, which will
             | conditionally have a baked in vault client that fetches
             | your secret for you.
             | 
             | Your service doesn't really care where the secret comes
             | from, as long it can use that secret to connect to some
             | database, API or whatever. So IMO it makes your application
             | code a bit cleaner knowing that it doesn't have to worry
             | about where to fetch a secret from.
        
               | darkwater wrote:
               | Ok, so you are indeed advocating for the sidecar approach
               | (and on this I fully agree, especially this Vault
               | example)
        
         | miduil wrote:
         | Author of linkerd argues that splitting this responsibility
         | will improve stability as you'll have a homogeneous interface
         | (sidecar proxy) over a heterogeneous group of pods. Updating a
         | sidecar-container (or using the same across all applications)
         | is possible, whereas if it's integrated into the application
         | you'll encounter much more barriers and need much wider
         | coordination.
        
         | dboreham wrote:
         | Like it or not the socket has become the demarcation mechanism
         | we use. Therefore all software ends up deployed as a thing that
         | talks on sockets. Therefore you can't/shouldn't put
         | functionality that belongs on the other end of the socket
         | inside that thing. If you do that it's no longer the kind of
         | thing you wanted (a discrete unit of software that does
         | something). It's now a larger kind of component (software that
         | does something, plus elements of the environment that software
         | runs within). You probably don't want that.
        
           | pjmlp wrote:
           | The irony is arguing for monolithic kernels with a pile of
           | such layers on top.
        
       | Matthias247 wrote:
       | I understand how BPF works for transparently steering TCP
       | connections. But the article mentions gRPC - which means HTTP2.
       | How can the BPF module be a replacement for a proxy here. My
       | understanding is it would need to understand http2 framing and
       | having buffers - which all sound like capabilities that require
       | more than BPF?
       | 
       | Are they implementing a http2 capable proxy in native kernel C
       | code and making APIs to that accessible via bpf?
        
         | tgraf wrote:
         | The model I'm describing contains two pieces: 1) Moving away
         | from sidecars to per-node proxies that can be better integrated
         | into the Linux kernel concept of namespacing instead of
         | artificially injecting them with complicated iptables
         | redirection logic at the network level. 2) Providing the HTTP
         | awareness directly with eBPF using eBPF-based protocol parsers.
         | The parser itself is written in eBPF which has a ton of
         | security benefits because it runs in a sandboxed environment.
         | 
         | We are doing both. Aspect 2) is currently done for HTTP
         | visibility and we will be working on connection splicing and
         | HTTP header mutation going forward.
        
           | tptacek wrote:
           | What does an HTTP parser written in BPF look like? Bounded
           | loops only --- meaning no string libraries --- seems like a
           | hell of a constraint there.
        
             | tgraf wrote:
             | It looks not too different from the majority of HTTP
             | parsers out there written in C. Here is an example of
             | NodeJS [0].
             | 
             | [0] https://github.com/nodejs/http-
             | parser/blob/main/http_parser....
        
               | tptacek wrote:
               | Node's HTTP parser doesn't have to placate the BPF
               | verifier, is why I'm asking.
        
       | ko27 wrote:
       | Not convinced that this a better solution then just implementing
       | these features as part of the protocol. For example, most
       | languages have libraries that support grpc load balancing.
       | 
       | https://github.com/grpc/proposal/blob/master/A27-xds-global-...
        
       ___________________________________________________________________
       (page generated 2021-12-09 23:00 UTC)