[HN Gopher] Cryptographers solve decades-old privacy problem ___________________________________________________________________ Cryptographers solve decades-old privacy problem Author : Brajeshwar Score : 226 points Date : 2023-11-18 15:37 UTC (7 hours ago) (HTM) web link (nautil.us) (TXT) w3m dump (nautil.us) | j2kun wrote: | It's an exciting time to be working in homomorphic encryption! | noman-land wrote: | Homomorphic encryption and zero knowledge proofs are the most | exciting technologies for me for the past bunch of years | (assuming they work (I'm not qualified enough to know)). | | Having third parties compute on encrypted, private data, and | return results without being able to know the inputs or outputs | is pretty amazing. | drakenot wrote: | Can you give an example of a useful computation someone would | do against encrypted data? | | Trying to understand where this would come in handy. | hedora wrote: | 100% of the things you use Google for (drive, docs, search, | personalization, maps, etc). | fidotron wrote: | In an extreme case could Google use a mechanism like this | to deny themselves direct access to the data they collect | while still bidding in ad auctions based on that | information? | janandonly wrote: | Proving you have enough money without divulging the exact | amount ? | daveguy wrote: | It would come in handy with not requiring any trust of the | cloud that is running your servers. For protected health | information this could potentially be a major breakthrough | in patient privacy. | NortySpock wrote: | The general concept is "I want to keep this data private | and confidential, but I want to be able to rent elastic | compute for it". Previously, at the end of the day, the | processor could only do useful work on unencrypted data. | | But something like "analyze this list of customer records | for fraud" or "analyze this proprietary data for trends" | has previously either required a lot of trust that your | cloud provider isn't going to siphon your data, or just | required on-prem hardware that you cannot scale as easily. | | If "math on encrypted data" works, we could keep our data | confidential while still sending of batches of it to be | number-crunched at a cloud provider. | | Or even start talking about distributed / peer-to-peer | computing, where a swarm of users (or, say, businesses that | wish to cooperate) can send compute jobs to each other | without having to trust that the other members weren't | going to go snooping through the data that was sent. | alwa wrote: | And for that matter if I were a provider of some kind of | computational service for hire, I (and my insurers) might | feel a great deal of relief at the idea that we're no | longer having to sit on a big attractive ransomable pile | of our clients' data. | Obscurity4340 wrote: | Has any world government implemented this in a | similar/relevant fashion that illustrates this in action? | fragmede wrote: | No because this is the very bleeding edge of technology | so there are only very limited (but very cool) tech demos | of the technology? Nothing at scale yet. | | https://spiralwiki.com/ | calamari4065 wrote: | Wait, so they're doing computation on encrypted data and | inferring some information about the relationships | between data in the encrypted set? | | How is that even possible? That seems to defeat the | purpose of encryption, or at least show that our | encryption is severely flawed. | | This is the first time I've heard of this and it's kind | of blowing my mind | fiddlerwoaroof wrote: | The idea is the result is encrypted too such that, when | the person that holds the key gets it back they can | decrypt it and see the result. So you can run an | inference algorithm on data without ever having to | decrypt it and see what the data actually is. | | It seems to me that this intrinsically is vulnerable to | side-channel attacks, but it will be interesting to see | if we can avoid those with constant-time algorithms or | something. | l33t7332273 wrote: | What about this seems vulnerable to side channels? | fiddlerwoaroof wrote: | Because, if you have a mechanism to run arbitrary | computations on encrypted data, I'd be a bit concerned | about an attacker running carefully crafted computations | on the data and deducing information about the encrypted | data based on how the ciphertext changes or the | time/memory usage of the computation. | | This isn't really a particularly well-informed suspicion: | it's partly based on a sense that FHE is a "have your | cake and eat it too" sort of technology that's too good | to be true and partly based on the recent discoveries | that fairly well-understood things like speculative | execution in CPUswere vulnerable to serious side-channel | attacks. | l33t7332273 wrote: | A homomorphism is a function so that f(ab) = f(a) f(b). | So you can imagine the cloud customer wants to compute | ab, but the use a homomorphic encryption function, f, to | ask the cloud provider to computer f(ab) given f(a) and | f(b). | | Then the customer decrypts f(ab). This doesn't imply any | weakness in encryption. | | FHE is a bit stronger than what I've described, but | that's the idea. | trompetenaccoun wrote: | In addition to the other answers, any situation where you | want to prove identity but want to preserve privacy at the | same time. For example you could prove you're an adult | citizen with ID without revealing your ID number, picture, | or any private information whatsoever. | azeemba wrote: | I had written an intro level blog post exploring image | analysis use HME: https://azeemba.com/posts/homomorphic- | encryption-with-images... | | One paper I used showed how X-rays can be analyzed without | exposing the X-ray data itself. | | This can be generalized more to any situation where one | party owns a proprietary algorithm and another party owns | sensitive input data but they don't trust each other. | | Modern LLMs are actually the perfect example. You want to | use chatgpt but they don't want to give you their model and | you don't want them to see your input. If HME was more | efficient, you could use it so that the person executing | the model never sees your real input. | koolba wrote: | > One paper I used showed how X-rays can be analyzed | without exposing the X-ray data itself. | | Is there any hope of getting the performance to the point | where something like that would be feasible? I'd imagine | the raw data itself would be so big that the performance | for anything non trivial would be unworkable. | wanderingbort wrote: | If it's just that the parties don't trust each other then | the cost of HME has to be compared to the current "state | of the art" which is contracts and enforcement thereof. | | In practice, I don't think those costs are that high | because the rate of incident is low and the average | damage is also low. | | Yes there are outlier instances of large breaches but | these seem like high profile aircraft crashes considering | how many entities have sensitive data. | brendoelfrendo wrote: | Data incidents cause more problems than can easily be | resolved with a contract lawsuit. Perhaps the data was | siphoned by a 3rd party that hacked your vendor, or a | malicious insider at your vendor sold it to a competitor. | Sure, you can recoup some losses by suing your vendor for | breach of contract, but once the data is leaked, it's | never secret again. | | And then there's the example of businesses that work with | lots of confidential customer data, like banks or | doctors. Again, you can sue your vendor for breach of | contract if they behave irresponsibly with your data, but | your customers may not care; you're going to suffer a hit | to your reputation regardless of whether or not the | breach was your fault. | wanderingbort wrote: | You can say it's insufficient but it is what it costs | them today. | | I guess the better comparison is that cost in a financial | statement plus some expected increase in revenue due to a | "better" product. | | Again, I think you are correct in your analysis of the | improvements but that contributes little to the revenue | as explaining the benefit to most customers requires | framing your existing product as potentially harmful to | them. Educating them will be hard and it may result in an | offsetting realization that they were unsafe before and | as a result were paying too much. | alwa wrote: | I feel like trust is a spectrum, and the promise of these | techniques is that they reduce the need for trust in the | first place. | | We should consider what kinds of computational tasks | today's responsible parties (or their regulators, or | their insurers) think of as too risky to casually trust | to third parties under the status quo. For example with | my block storage provably unintelligible if you don't | have the HSM I keep safely in my corporate dungeon, I'm | comfortable not caring whose racks the encrypted blocks | sit on. I'd have to vet those vendors a lot harder if | they could read all my super secret diaries or whatever. | | And, for that matter, it's on the service provider side | too, right? Even the contractual, spit-and-handshake | pinky-swear-based mode of enforcement comes with | significant compliance costs for service providers, | especially ones operating in regulated industries. | Perhaps it's not too much to hope that effective and | efficient HME techniques might reduce those service | providers' compliance costs, and lower the barrier to | entry for new competitors. | | I'm reminded how even non-tech people in my life became | much more willing to trust their credit card details to | online retailers once they felt like a little green lock | icon made it "safe". Of course a LOT changed over that | same period, but still: the underlying contractual | boundaries didn't substantially change--in the US the | customer, then as now, has only ever been responsible for | a certain amount of fraud/theft loss--but people's risk | attitudes updated when the security context changed, and | it opened up vast new efficiencies and lines of business. | wanderingbort wrote: | It's not too much to hope that HME reduces those | compliance costs. However, I believe it is too much to | assume there will be any material adoption before it can | demonstrate that reduction. | | Reduction of trust is not a value add, it is a cost | reduction. Maybe that cost is blocking a valuable | product/service but either that product/service's value | is less than the current cost of trust OR trust has to be | far more costly in the context of the new | product/service. | | It's only the latter that I find interesting which is why | tend to be pretty hard on suggestions that this will do | much for anything that currently exists. At best, it will | improve profits marginally for those incumbents. | | What is something where the price of trust is so | catastrophically high in modern society AND HME can | reduce that cost by orders of magnitude? Let's talk about | that rather than HME. | noman-land wrote: | There are a few that I've thought about but the first one | that comes to mind is proving you're 21+ to an alcohol | store cashier without handing over your ID with all your | personal info. You could just give them a ZK proof that | returns a true/false and they could check it against | encrypted data (signed by the DMV). | wanderingbort wrote: | Why is that preferable over a message attesting "over 21" | signed by the DMV? | | The hard parts here are retrofitting society to use a | digital ID and how to prove that the human in front of | you is attached to that digital ID. | | The solutions there all seem like dystopias where now | instead of a bouncer looking at your ID for a few | seconds, technology is taking pictures of you everywhere | and can log that with location and time trivially. | noman-land wrote: | It doesn't have to be a digital ID, it can just be | encrypted data encoded on a regular ID on a QR code. | | Age depends on timestamp. The encrypted data is stored on | the ID and signed by the DMV, with a function that can be | run by the bouncer's scanning machine that plugs in a | now() timestamp, and receives a boolean in return. The | DMV doesn't even need to be involved after the issuance | of the ID and no network access is needed for this | calculation. | | No one's location was tracked and no one's picture was | taken and now a bouncer who fancies you can't turn up at | your house after glancing at your ID. | captn3m0 wrote: | Age verification (without leaking other PII) was the | illustrative scenario for W3C Verified Credentials (it | lets you use a validating authority to sign specific | subset of your schema). | | There's lots of other ways to solve the problem for | verification/signing use cases tbh. Homomorphic | encryption shines best when you are looking at more | complex calculations than just a Boolean result - such as | tax calculations. | | Can you submit your financial information and have your | taxes calculated without revealing the amounts involved? | Can you apply filters to an image without having the | image readable by the server? It essentially allows us to | "trust a remote server" in scenarios where one wouldn't | usually. | wanderingbort wrote: | How do you know that the bouncers scanning machine didn't | log the interaction? | | The whole value prop is built on not trusting that | bouncer and by extension their hardware. | | Everything would have to be encrypted leading to the | bouncer also needing to establish that this opaque | identifier actually belongs to you. This is where some | picture or biometric comes into play and since the | bouncer cannot evaluate it with their own wetware you are | surrendering more data to a device you cannot trust. | | They also cannot trust your device. So, I don't see a | scenario where you can prove ownership of the ID to a | person without their device convincing them of it. | adastra22 wrote: | It's not. And since it is also 10000x slower than merely | checking a signed number, nobody is interested in doing | this. | dimastopel wrote: | Imagine a search engine that gives you results without | knowing your search query. | westurner wrote: | Grading students' notebooks on their own computers without | giving the answers away. | | https://news.ycombinator.com/item?id=37981190 : | | > _How can they be sure what 's using their CPU?_ | | Firefox, Chrome: <Shift>+<Escape> to open about:processes | | Chromebook: <Search>+<Escape> to open Task Manager | OJFord wrote: | All sorts - anything where you today give your data to some | SaaS (and they encrypt it at rest, sure), and they then | provide you some insight based on it or allow you to do | something with it (by decrypting it, crunching numbers, | spitting out the answer, and then encrypting the source | data again) could potentially be done without the | decryption (or the keys to make it possible). | | Concretely (and simplistically) - I give you two numbers, 2 | & 2, but they're encrypted so you don't know what they are, | and you add them together for me, but that's also encrypted | so you don't even know that the sum of the inputs I gave | was 4. It's 'computation on encrypted data', basically. | SilasX wrote: | It lets you have someone add up your numbers, and give you | the sum, without knowing the input numbers or their sum. | Basically, any time you want someone else to host the data, | while you can also do queries/computations on it. | | That now generalizes to any computation (not just | addition), because there is a logically complete set of | primitive operations for which fully homomorphic encryption | (FHE) can be done. | | Caveats: | | 1) You can't actually do e.g. while loops with run time | unknown. All such FHE computations are done by generating a | fixed size boolean circuit for a given input size. It's | "Turing-complete" in the sense that you can size up the | circuit to any input size, but it wouldn't directly | implement an unbounded while loop -- you have to generate a | different "program" (circuit) for each input size. | | 2) All such computations must have a fixed output size -- | else it leaks information about the data inside. So | allowable computations would be like, "give me the first 30 | bytes of the result of querying for names in the set that | begin with A". If there are no results, the output still | has to be 30 bytes. | | 3) For similar reasons, any FHE computation must look at | all bits of the input (otherwise, it leaks info about | what's in them). "See if this value is in the set" can't | just jump to the relevant section. | | 4) The untrusted third party knows what computation you're | doing (at least at a low level), just not the data it's | being performed on or the result. | | 5) As you might have expected from 3), there's a massive | blowup in resources to do it. There can be improvements, | but some blowups are inherent to FHE. | podnami wrote: | There are examples in cryptocurrency, where ZK proofs are | all the rage. One use case is validating ownership of some | amount of currency without knowing the actual amount, or | verifying that a smart contract was executed correctly | without revealing the specifics of the contracts internal | state. Some blockchains use this in production, such as | Polygons ZkEvm. | littlestymaar wrote: | Functional encryption is even cooler than FHE (but even more | prohibitively expensive) | noman-land wrote: | Got any cool links to read about it? | j2kun wrote: | It's also not complete: only a handful of functions are | known to have functional encryption schemes. | collsni wrote: | Ok | ForkMeOnTinder wrote: | It's a published paper about homomorphic encryption, so... | unlikely. | InCityDreams wrote: | Yes, or no? | ChrisArchitect wrote: | The paper: | | _Doubly Efficient Private Information Retrieval and Fully | Homomorphic RAM Computation from Ring LWE_ | | https://eprint.iacr.org/2022/1703 | kelsey9876543 wrote: | Word salad with no cryptographic proof, why is this even on HN | l33t7332273 wrote: | Well, for starters it won the best paper award at ACM | Symposium on Theoretical Computing. | | Can you elaborate on which particular part you consider to be | "word salad" as opposed to just unfamiliar? | ChrisArchitect wrote: | Original article submitted a few times weeks ago: | | https://news.ycombinator.com/item?id=38164732 | desdenova wrote: | Good to see some people are actually solving this problem, unlike | all those startups using it as a marketing buzzword. | hanniabu wrote: | I assume you're referring to the blockchain industry, but | they've advanced cryptography a lot, specifically with | zkproofs. | lucb1e wrote: | I know of the concept of zero-knowledge proofs, but didn't | know that the blockchain industry advanced cryptography a lot | in that area. What are the practical applications of those | new things? Or which new things are there to begin with? The | Wikipedia article on zero-knowledge proof doesn't seem to say | dumbfounder wrote: | With zk you can prove you own something or sign | transactions without people tracking your entire history. | It is also used to help scale chains by rolling up multiple | transactions into one proof. | hugodutka wrote: | One of the applications are ZK-Rollups [1] which allow | developers to move heavy computation off a blockchain. The | blockchain receives the results and only validates proofs | that they are valid. This is especially useful on Ethereum | because its computational throughput is pretty low. | | There's also ZCash [2], which is a cryptocurrency that lets | you make untraceable transactions. This is in stark | contrast to Bitcoin or Ethereum, where transaction | information is available publicly to everyone. They have a | series of blog posts [3] on the math that actually makes it | work under the hood. | | [1] https://ethereum.org/en/developers/docs/scaling/zk- | rollups/ | | [2] https://z.cash | | [3] https://electriccoin.co/blog/snark-explain/ | captn3m0 wrote: | The first time I can across PIR was via the 2014 blog post from | Signal on how Private Contact Discovery was a hard problem and | how it required PIR to be solved. | https://signal.org/blog/contact-discovery/ | | Maybe this will help Signal get to a viable solution in a few | years. | lucb1e wrote: | Note that Signal decided not to use that: | | > Unfortunately, for a reasonably large user base, this | strategy doesn't work because the bloom filters themselves | are too large to transmit to mobile clients. For 10 million | TextSecure users, a bloom filter like this would be ~40MB, | requested from the server 116 times a second if every client | refreshes it once a day. | | They decided to run computations inside a 'secure' hardware | environment instead (SGX specifically) so that they can't get | access to the computation themselves but it also doesn't need | to be run client side. I assume you meant the former thing, | but the approach they actually use is fundamentally different | from homomorphic encryption / PIR. | nly wrote: | The problem with all these privacy preserving cryptographic | breakthroughs is they are never deployed in practice. | | Just look at cryptocurrency. We've known how to create a privacy | preserving, truly distributed, cryptographic replacement to cash | for decades, and what we end up with instead is Bitcoin and the | like, which is pseudo-anonymous only and ends up being | centralised anyway to interact with the fiat world. | | Theres no demand for this tech in current society. | | _Sigh_ | persnickety wrote: | > a privacy preserving, truly distributed, cryptographic | replacement to cash | | I didn't realize this was known. Could you explain or provide | an example? | Aerbil313 wrote: | Monero. Disagree with the GP though, these things certainly | weren't around or known decades ago. | ric2b wrote: | Monero is the common example, since it solves the pseudo- | privacy issues that Bitcoin has while otherwise being very | similar. | raincom wrote: | David Chaum [1], a famous Cryptographer, founded International | association of Cryptologic Research (IACR). He published so | many articles on digital-cash, anonymous cash, etc. He had | patents on them; he even founded a company on that concept. | However, that company failed. | | [1] https://en.wikipedia.org/wiki/David_Chaum | j2kun wrote: | South Korea used it for their COVID contact tracing system. | thewanderer1983 wrote: | That's not true. Sometimes these technologies get pulled up | high side or are only developed there so the public doesn't | hear about them. You should read the ietf paper on the crypto | wars. Crypto and ZKPs are some of the known attempts to keep | these technologies out of the public. | krater23 wrote: | Turns out: Paper reveales how they removed google from the | planet... | llwj wrote: | How efficient is it now? The last time I checked, FHE required | minutes of computation and gigabytes of memory to store tiny | amounts of data, and since it does not hold IND-CCA, I could not | find any use cases. | sangel wrote: | Very inefficient. Like wildly so. Specifically if you have a | very small database and you preprocess it with their | techniques, the resulting database is petabytes in size. But | the results are very beautiful. | | There are no obvious ways to improve on this work, so it is not | a matter of engineering. We really do need another breakthrough | result to get us closer to practicality. | rhindi wrote: | FHE in general is efficient enough for many applications now. | You can see some benchmarks here: https://docs.zama.ai/tfhe- | rs/getting-started/benchmarks | jquery wrote: | 2 seconds to perform a single division on an a 16-bit? int? | Am I reading that chart correctly? | catilinam wrote: | Isn't FHE by definition not INDCCA? Weak objection imo | jengels_ wrote: | Pretty efficient! E.g. a recent paper describes a system to do | fully private search over the common crawl (360 million web | pages) with an end to end latency of 2.7 seconds: | https://dl.acm.org/doi/10.1145/3600006.3613134 | vlovich123 wrote: | Yeah, but that's 1 request using 100% capacity of a 45 | machine cluster. Relatively efficient but cost prohibitive. | avmich wrote: | How does it compare to the search done without any | encryption? | seeknotfind wrote: | Many many orders of magnitude. It really depends on the | search algorithm. If you're looking up a single term, | there are many standard ways to index this, but for | instance using a trie or hash table, the lookup cost | would be a number of operations proportional to the | search term length (e.g. for "foo", it's a of length 3, | times some small number of operations for each letter). | Plus then if you want to combine the results and sort by | frequency, to estimate the cost of this, add up the | number of pages from each term, p_1 + p_2 + ... = P. Then | the number of operations might be P*log(P). This | dominates the cost. So if your search terms appear in | 1,000,000 of these pages, you're talking about on the | order of ~6,000,000 small operations. An implementation | would be able to execute this many operations on a single | CPU in 10s or 100s of milliseconds maybe, for an | elementary unoptimized algorithm. | godelski wrote: | Not an answer, but a question that I hope someone can answer. | Is the lack of speed because of a lack of optimization or | compute? Is this something that could be fixed by an | accelerator? | | It's often hard to compare things in an accurate way. I mean | many current methods might already have hardware specific | acceleration and researchers aren't always good at optimization | (why should they be?). So is the problem of FHE a bigger lack | of just not enough people (specialists) putting in time to | optimize it or is it so computationally intensive that we need | hardware to get better before it becomes practical? I mean look | at llama.cpp and how much faster it is or stable diffusion? | Sure, both kinda slow (diffusion's no GAN and SD's not straight | diffusion) but very usable for many applications. | asasidh wrote: | Would a solution like mixing address the issue with acceptable | tradeoffs ? https://arxiv.org/abs/1905.12264 | I_am_tiberius wrote: | As most likely lots of cryptographers read this, I have a | question. What's an efficient way to store encrypted data in a | database (only specific table columns) and decrypt them on the | fly while querying the data? Using postgres with | pgp_sym_encrypt(data, key) seems very slow. | fyokdrigd wrote: | so, not private nor efficient nor solution to homomorphic data. | | basically a polynomial factor hash of the index keys... basically | you will need the entire db plus the larger hashed keys. doesn't | help at all implementing any of the outlandish claims. | | guess the merit is in proving the poly fact they build on as | secure. but that is not a game changer as there are better | alternatives and nobody solved the claimed problems with them. | wolverine876 wrote: | It understand it's inefficient, but could it be used by a well- | resourced organization where confidentiality is extremely high- | value and the database is small, such as intelligence, military | plans, Apple's product plans, etc.? | narinxas wrote: | there are much cheaper ways, chief amongst them is to use paper | and pencils | godelski wrote: | How do people with pen and paper operate on 10k pieces of | data? Which is honestly a rather small number. There's a | reason we use computers and why statistics accelerated after | its invention. | jdefr89 wrote: | This is super misleading and sort of ambiguous. Are they claiming | they eliminated any side channel information that can be used? I | am confused here. The problem is anyone can collect various side | channel information and use it as Meta data... Fully Homo. | Encryption only helps after you've established secure links and | to do that you will inevitably have to use a less secure link or | system that which can be used in and of itself to make inferences | about queries... The real issue is we don't know if the "secure" | systems we use that we don't control actually give us the privacy | they claim to... | parentheses wrote: | ++ it's unclear what data this breakthrough is helping to keep | private and how. | l33t7332273 wrote: | You can use differential privacy to protect from most (timing, | memory, etc) side channels in distributed analytics. ___________________________________________________________________ (page generated 2023-11-18 23:00 UTC)