[HN Gopher] The relevance of IP addresses in the tracking ecosys... ___________________________________________________________________ The relevance of IP addresses in the tracking ecosystem [pdf] Author : lesterpig Score : 89 points Date : 2020-04-15 16:09 UTC (6 hours ago) (HTM) web link (hal.inria.fr) (TXT) w3m dump (hal.inria.fr) | kube-system wrote: | The big conclusion to be drawn from this paper is that IP address | tracking can be combined with client-side tracking techniques to | perform reidentification. | api wrote: | IP is just one more data point. There are already so many ways a | browser can be fingerprinted, it doesn't make things that much | worse. | | While you can limit your exposure a bit, I long ago reached the | conclusion that strong privacy is impossible in the current | client/server web model. There is too much surface area. | jedberg wrote: | IPv6 improves this situation (now). At first, ipv6 was actually a | lot worse, since the back 1/2 of your address was your MAC | address, allowing your device to be tracked around the internet | no matter where it went. | | People quickly realized this flaw, and updated the standard so | that basically your client gets to pick the second 1/2 of your | address now. And the nice thing is, most major platforms will | actually run multiple addresses in parallel, allowing new | connections to use a new address while old connections keep using | the old one. | | So while ipv4 adds some protection by having multiple clients | behind the firewall, ipv6 actually makes it better by looking | like _even more_ clients behind the firewall. | | Combined with a browser that blocks fingerprinting you get | slightly better privacy with ipv6. | saurik wrote: | That seems backwards: with NAT you couldn't identify all of the | individual computers on my Internet connection; but now, with | IPv6, you either can at worst (as every device has its own IP | address that it reuses) or, at best (generating a new address | for every single connection), are just getting yourself back to | where you were with NAT. I appreciate that for a while IPv6 was | actively _much worse_ as it allowed address correlation across | multiple networks (due to the MAC address being the same in the | lower 64 bits), but fixing that doesn 't make it _better_ than | NAT: if you mostly care about privacy, it still seems to make | the most sense to use NAT if at all possible. | ignoramous wrote: | > _with NAT you couldn 't identify all of the individual | computers on my Internet connection..._ | | Not if you're using WebRTC which would promptly leak the | private IP address: | https://news.ycombinator.com/item?id=12528184 | zrm wrote: | There are also several other methods that often work, e.g. | the X-Forwarded-For HTTP header if there is a local proxy, | or a long list of non-HTTP protocols or tunnel-XYZ-inside- | HTTP protocols that have the client IP as part of the | protocol. | | Temporary IPv6 addresses actually solve this, especially | for P2P systems that do benefit from knowing the client IP | in case there is a peer on the same LAN, because the | address they get is the same one the remote server sees (no | additional information) and then it's both a valid local | address but hard to correlate with anything when every | device can have hundreds of them at once. | kube-system wrote: | Unless you have many thousands of devices on your network, I | can't see it actually having any practical advantage. I have | maybe a dozen devices on my network, that is going to require | only a tiny amount of entropy to uniquely identify the devices, | in conjunction with the first half of the address. You wouldn't | even need an entire UA string in many cases. Resist | fingerprinting != prevent fingerprinting. | | The resist fingerprinting measures in browsers are intended to | help you blend in with millions of other devices. You'll still | likely stand out like a sore thumb if the sample size is your | household. | samoa42 wrote: | > you get slightly better privacy with ipv6 | | its a larger identifying token serving less users, so no. | jedberg wrote: | You're thinking mathematically, not practically. | samoa42 wrote: | come again? | Faaak wrote: | Absolutely not. Most ISPs will allocate you a fixed /64. You | may well have privacy IPs in this /64, the prefix will always | be the same.. A though day for privacy activists | jedberg wrote: | Sure, just like with an IPv4 they allocate you a fixed /32. | | But you get slightly more privacy by having the client able | to randomize the other 1/2 of the address and use multiple | addresses, which would confuse trackers. Or the trackers just | look at the first /64 and ignore the rest and you're no worse | off than you were with your ipV4 /32. | saurik wrote: | You are worse off as now you have some introspection into | computers on the other side of that firewall; with NAT you | could have thousands of computers and they would all get | melded together as one, but unless you very carefully | generate a new IP address for every single connection you | make (which is how you can get back to where you were with | NAT), you now have the ability to somewhat differentiate | users who before would have been mixed. So it is at best | the same but probably worse, at which point why not use the | thing that is always at least as good if not much | better?... | Avamander wrote: | Sounds like we need shorter IPv6 leases and more rotation | between the prefixes, but that somehow goes a bit against | what IPv6 should provide us - freedom to hold on to an | address. | zrm wrote: | Some systems already have a solution for this, since | devices can have multiple IPv6 addresses at once. They | have one permanent IPv6 address which is not used for | outgoing connections but can be used for incoming | connections, and then temporary IPv6 addresses used for | outgoing connections which can be rotated arbitrarily | often. The first address is permanent but can only be | used if you already know it. | orbital-decay wrote: | _> just like with an IPv4 they allocate you a fixed /32_ | | Many (most?) ISPs that give you unique IPv4 addresses also | use dynamic pools. Just reconnect your router and blend | into the pool. Or, if you're behind NAT, you're already | indistinguishable from others. | samoa42 wrote: | > Most ISPs will allocate you a fixed /64 | | not really. the case where someone in a household power- | cycles the router and it gets a new ipv6-addy and then one | has to power-cycle all networked devices too because they | have no way of knowing that the old prefix is dead, is very | common. | swinglock wrote: | With IPv6 one user is a /64 whereas with IPv4 one user is a | /32. That's about equal from a privacy perspective. | | But then in reality one IPv4 /32 is often many users due to | NAT. Doesn't that make privacy better with v4? | AgentME wrote: | In my experience, in the same cases that IPv4 would have | multiple users behind one /32, IPv6 would have multiple users | behind one /64. | saurik wrote: | Except now you are potentially leaking information about | the individual computers in the lower 64 bits; if you | generate those randomly for every single connection then | you can mitigate that, but is that how this is actually | being implemented or are they just doing periodic cycling? | With NAT you were just guaranteed that this would always be | safe. | eh78ssxv2f wrote: | > so that basically your client gets to pick the second 1/2 of | your address now. And the nice thing is, most major platforms | will actually run multiple addresses in parallel, allowing new | connections to use a new address while old connections keep | using the old one. | | This is really interesting and I was not aware of this. Are | there any links where I can read about this more? I tried | searching around, but did not find anything. | AnonC wrote: | I didn't get your description of IPv6 offering better privacy | protection. | | With IPv6, your ISP could give you one address for each of your | devices for life without any grudges or pain, assuming it | doesn't mind losing out on the static IP add on pricing that | some charge in the scarce IPv4 space. That would enable better | long term tracking without additional tokens (like cookies) and | fingerprinting. We'll never run out of IPv6 addresses. | jedberg wrote: | Your ISP only assigned you the first 1/2 of the address. You | get to pick the second 1/2 yourself. Your client can (and in | most cases will) constantly switch up the second 1/2 of the | address. | annoyingnoob wrote: | I suppose this only works if you combine IP with some other | information, like username or browser fingerprint. Otherwise you | could be tracking multiple 'users' at the same IP. | parhamn wrote: | The whole multiple-users-on-one-ipv4 thing always feels like it | distracts from what a tracker's goal really is. It is | definitely beyond sufficient identification and more than | needed for a tracker to start targeting you with ads and what | not. | | In some ways the fact that IP only tracking is lossy but still | good enough is something to be afraid of. It is easy to 'taint' | an IP and your households behind-the-scenes profile over a | google search by your guest using your wifi. | annoyingnoob wrote: | Advertising that is relevant to my wife probably isn't | relevant to me, using IP is not good enough. The title and | focus here is on _user tracking_ , which is the goal I'm | commenting about. Its quite common to have multiple users | behind the same IP, those users and their actions may or may | not be related (I have no idea what my co-workers might be | doing for example but we share a static IP). | | I honestly didn't follow your Google search/wifi example. | pm_me_ur_fullz wrote: | "It's listening to me!" | | and your spouse's typed in search queries | | and your playstation | | and your smart coffee machine | | and alexa actually is | | thats all without getting me started on the analytics | networks in other apps sharing with the same data brokers | between apps | virgilp wrote: | > using IP is not good enough | | Define "not good enough"(for who is it not good enough? for | advertisers?). | | I used to work on a project linking cookies together in | anonymous profiles (for advertising), initially the plan | was to separate household/individual profiles using | different heuristics, but I think eventually it turned out | that nobody cares that much - the advertisers just wanted a | rough cross-device profile, not perfect accuracy. I mean | sure, for marketing purposes (as well as engineering/ | "performance" reasons) you needed figures to brag about | accuracy and whatnot. But it's really really hard to get | those figures right, and ultimately the thing that will | convince advertisers is "using cross-device profile | improved conversion rate by 10%"; everything else is a | detail in comparison. | annoyingnoob wrote: | The title and focus here is on _user tracking_ , which is | the goal I'm commenting about. | icedchai wrote: | It's "good enough" because it's still better than ads that | are not-targeted at all. | kube-system wrote: | An IP address wouldn't be good enough to perform | identification, but it would likely be good enough to | perform reidentification. Most people on a network are not | clearing their browser caches at the same exact time. The | real power here is in using IP addresses in combination | with other fingerprinting techniques. | annoyingnoob wrote: | > The real power here is in using IP addresses in | combination with other fingerprinting techniques. | | I didn't think that was anything new. | | It also sounds like a possible path for exploit, kind of | like not requiring a password when you call voicemail | from your own phone (one could spoof your number as the | caller id and access your voicemail without a password). | sixothree wrote: | This sounds like circular FUD to me. | annoyingnoob wrote: | How so? | parhamn wrote: | > I honestly didn't follow your Google search/wifi example. | | Sorry, wasn't clear there. I generally don't think the | 'match' thresholds need to be that high for effective (in | terms of dollars earned vs spent) targeted ads. | | In the google example: if a friend comes over a searches | for designer watches odds are the whole IP/home is a decent | enough target for designer watches. | Zenst wrote: | Most users will at the very least use two IP addresses - home | broadband and mobile SIM broadband. | | Then you have wifi hotspots, friends wifi. The average user uses | many IP's and not limited to the range of one ISP. | | SO whilst you can fingerprint devices and usage patterns, the IP | address will by itself be useless to identify such users, it may | well augment a little but is no solution. | | But then IPv4 shares many IP addresses across mobile and | broadband users in various ways. Most do not have a fixed IP and | even those that do, do not have a fixed IP upon their mobile data | activities - unless they VPN into their home broadband. Though if | they use service VPN offerings, then another layer of IP ranges. | | So the potential towards false assertions based upon an IP and | user usage may well trip up and fail. Imagine using an ISP with | dynamic IP and the next user of that IP uses it for crime, well | with some bad logging and aggressive association, you can | mislabel somebody for a crime they did not commit. | | Roll on IPv6 and with that, mobile carriers would of been the | obvious benefit of that, yet I'm not aware of any progressing | that in any timely manner and chug along using a pool of IPv4 and | various tricks to make those cater for many. | scared2 wrote: | However in this paper the authors tried to show is it's | stability over time. So overall their "findings" indicate ip | addresses should not be overlooked in privacy protection.s they | stated as follows: | | "... Over time, a same device communicates with our server | using a set of distinct IP addresses, but we find that devices | reuse some of their previous IP addresses for long periods of | time. We call this IP address retention." | Zenst wrote: | As somebody who tracks and logs their IP's given via ISP etc, | I can attest, not that distinct over time. | jentulman wrote: | from the abstract.... | | "In this paper, we study the stability of the public IP addresses | a user device uses to communicate with our server. Over time, a | same device communicates with our server using a set of distinct | IP addresses, but we find that devices reuse some of their | previous IP addresses for long periods of time. We call this IP | address retention and, the duration for which an IP address is | retained by a device, is named the IP address retention period. | We present an analysis of 34,488 unique public IP addresses | collected from 2,230 users over a period of 111 days and we show | that IP addresses remain a prime vector for online tracking. 87 % | of participants retain at least one IP address for more than a | month and 45 % of ISPs in our dataset allow keeping the same IP | address for more than 30 days." | AndyMcConachie wrote: | I worked on a document a few years back on anonymizing IP | addresses. If you find yourself in a situation where you need to | balance anonymization of IP addresses with research needs, this | paper may be useful. | | https://www.icann.org/en/system/files/files/rssac-040-07aug1... ___________________________________________________________________ (page generated 2020-04-15 23:00 UTC)