[HN Gopher] The relevance of IP addresses in the tracking ecosys...
       ___________________________________________________________________
        
       The relevance of IP addresses in the tracking ecosystem [pdf]
        
       Author : lesterpig
       Score  : 89 points
       Date   : 2020-04-15 16:09 UTC (6 hours ago)
        
 (HTM) web link (hal.inria.fr)
 (TXT) w3m dump (hal.inria.fr)
        
       | kube-system wrote:
       | The big conclusion to be drawn from this paper is that IP address
       | tracking can be combined with client-side tracking techniques to
       | perform reidentification.
        
       | api wrote:
       | IP is just one more data point. There are already so many ways a
       | browser can be fingerprinted, it doesn't make things that much
       | worse.
       | 
       | While you can limit your exposure a bit, I long ago reached the
       | conclusion that strong privacy is impossible in the current
       | client/server web model. There is too much surface area.
        
       | jedberg wrote:
       | IPv6 improves this situation (now). At first, ipv6 was actually a
       | lot worse, since the back 1/2 of your address was your MAC
       | address, allowing your device to be tracked around the internet
       | no matter where it went.
       | 
       | People quickly realized this flaw, and updated the standard so
       | that basically your client gets to pick the second 1/2 of your
       | address now. And the nice thing is, most major platforms will
       | actually run multiple addresses in parallel, allowing new
       | connections to use a new address while old connections keep using
       | the old one.
       | 
       | So while ipv4 adds some protection by having multiple clients
       | behind the firewall, ipv6 actually makes it better by looking
       | like _even more_ clients behind the firewall.
       | 
       | Combined with a browser that blocks fingerprinting you get
       | slightly better privacy with ipv6.
        
         | saurik wrote:
         | That seems backwards: with NAT you couldn't identify all of the
         | individual computers on my Internet connection; but now, with
         | IPv6, you either can at worst (as every device has its own IP
         | address that it reuses) or, at best (generating a new address
         | for every single connection), are just getting yourself back to
         | where you were with NAT. I appreciate that for a while IPv6 was
         | actively _much worse_ as it allowed address correlation across
         | multiple networks (due to the MAC address being the same in the
         | lower 64 bits), but fixing that doesn 't make it _better_ than
         | NAT: if you mostly care about privacy, it still seems to make
         | the most sense to use NAT if at all possible.
        
           | ignoramous wrote:
           | > _with NAT you couldn 't identify all of the individual
           | computers on my Internet connection..._
           | 
           | Not if you're using WebRTC which would promptly leak the
           | private IP address:
           | https://news.ycombinator.com/item?id=12528184
        
             | zrm wrote:
             | There are also several other methods that often work, e.g.
             | the X-Forwarded-For HTTP header if there is a local proxy,
             | or a long list of non-HTTP protocols or tunnel-XYZ-inside-
             | HTTP protocols that have the client IP as part of the
             | protocol.
             | 
             | Temporary IPv6 addresses actually solve this, especially
             | for P2P systems that do benefit from knowing the client IP
             | in case there is a peer on the same LAN, because the
             | address they get is the same one the remote server sees (no
             | additional information) and then it's both a valid local
             | address but hard to correlate with anything when every
             | device can have hundreds of them at once.
        
         | kube-system wrote:
         | Unless you have many thousands of devices on your network, I
         | can't see it actually having any practical advantage. I have
         | maybe a dozen devices on my network, that is going to require
         | only a tiny amount of entropy to uniquely identify the devices,
         | in conjunction with the first half of the address. You wouldn't
         | even need an entire UA string in many cases. Resist
         | fingerprinting != prevent fingerprinting.
         | 
         | The resist fingerprinting measures in browsers are intended to
         | help you blend in with millions of other devices. You'll still
         | likely stand out like a sore thumb if the sample size is your
         | household.
        
         | samoa42 wrote:
         | > you get slightly better privacy with ipv6
         | 
         | its a larger identifying token serving less users, so no.
        
           | jedberg wrote:
           | You're thinking mathematically, not practically.
        
             | samoa42 wrote:
             | come again?
        
         | Faaak wrote:
         | Absolutely not. Most ISPs will allocate you a fixed /64. You
         | may well have privacy IPs in this /64, the prefix will always
         | be the same.. A though day for privacy activists
        
           | jedberg wrote:
           | Sure, just like with an IPv4 they allocate you a fixed /32.
           | 
           | But you get slightly more privacy by having the client able
           | to randomize the other 1/2 of the address and use multiple
           | addresses, which would confuse trackers. Or the trackers just
           | look at the first /64 and ignore the rest and you're no worse
           | off than you were with your ipV4 /32.
        
             | saurik wrote:
             | You are worse off as now you have some introspection into
             | computers on the other side of that firewall; with NAT you
             | could have thousands of computers and they would all get
             | melded together as one, but unless you very carefully
             | generate a new IP address for every single connection you
             | make (which is how you can get back to where you were with
             | NAT), you now have the ability to somewhat differentiate
             | users who before would have been mixed. So it is at best
             | the same but probably worse, at which point why not use the
             | thing that is always at least as good if not much
             | better?...
        
               | Avamander wrote:
               | Sounds like we need shorter IPv6 leases and more rotation
               | between the prefixes, but that somehow goes a bit against
               | what IPv6 should provide us - freedom to hold on to an
               | address.
        
               | zrm wrote:
               | Some systems already have a solution for this, since
               | devices can have multiple IPv6 addresses at once. They
               | have one permanent IPv6 address which is not used for
               | outgoing connections but can be used for incoming
               | connections, and then temporary IPv6 addresses used for
               | outgoing connections which can be rotated arbitrarily
               | often. The first address is permanent but can only be
               | used if you already know it.
        
             | orbital-decay wrote:
             | _> just like with an IPv4 they allocate you a fixed /32_
             | 
             | Many (most?) ISPs that give you unique IPv4 addresses also
             | use dynamic pools. Just reconnect your router and blend
             | into the pool. Or, if you're behind NAT, you're already
             | indistinguishable from others.
        
           | samoa42 wrote:
           | > Most ISPs will allocate you a fixed /64
           | 
           | not really. the case where someone in a household power-
           | cycles the router and it gets a new ipv6-addy and then one
           | has to power-cycle all networked devices too because they
           | have no way of knowing that the old prefix is dead, is very
           | common.
        
         | swinglock wrote:
         | With IPv6 one user is a /64 whereas with IPv4 one user is a
         | /32. That's about equal from a privacy perspective.
         | 
         | But then in reality one IPv4 /32 is often many users due to
         | NAT. Doesn't that make privacy better with v4?
        
           | AgentME wrote:
           | In my experience, in the same cases that IPv4 would have
           | multiple users behind one /32, IPv6 would have multiple users
           | behind one /64.
        
             | saurik wrote:
             | Except now you are potentially leaking information about
             | the individual computers in the lower 64 bits; if you
             | generate those randomly for every single connection then
             | you can mitigate that, but is that how this is actually
             | being implemented or are they just doing periodic cycling?
             | With NAT you were just guaranteed that this would always be
             | safe.
        
         | eh78ssxv2f wrote:
         | > so that basically your client gets to pick the second 1/2 of
         | your address now. And the nice thing is, most major platforms
         | will actually run multiple addresses in parallel, allowing new
         | connections to use a new address while old connections keep
         | using the old one.
         | 
         | This is really interesting and I was not aware of this. Are
         | there any links where I can read about this more? I tried
         | searching around, but did not find anything.
        
         | AnonC wrote:
         | I didn't get your description of IPv6 offering better privacy
         | protection.
         | 
         | With IPv6, your ISP could give you one address for each of your
         | devices for life without any grudges or pain, assuming it
         | doesn't mind losing out on the static IP add on pricing that
         | some charge in the scarce IPv4 space. That would enable better
         | long term tracking without additional tokens (like cookies) and
         | fingerprinting. We'll never run out of IPv6 addresses.
        
           | jedberg wrote:
           | Your ISP only assigned you the first 1/2 of the address. You
           | get to pick the second 1/2 yourself. Your client can (and in
           | most cases will) constantly switch up the second 1/2 of the
           | address.
        
       | annoyingnoob wrote:
       | I suppose this only works if you combine IP with some other
       | information, like username or browser fingerprint. Otherwise you
       | could be tracking multiple 'users' at the same IP.
        
         | parhamn wrote:
         | The whole multiple-users-on-one-ipv4 thing always feels like it
         | distracts from what a tracker's goal really is. It is
         | definitely beyond sufficient identification and more than
         | needed for a tracker to start targeting you with ads and what
         | not.
         | 
         | In some ways the fact that IP only tracking is lossy but still
         | good enough is something to be afraid of. It is easy to 'taint'
         | an IP and your households behind-the-scenes profile over a
         | google search by your guest using your wifi.
        
           | annoyingnoob wrote:
           | Advertising that is relevant to my wife probably isn't
           | relevant to me, using IP is not good enough. The title and
           | focus here is on _user tracking_ , which is the goal I'm
           | commenting about. Its quite common to have multiple users
           | behind the same IP, those users and their actions may or may
           | not be related (I have no idea what my co-workers might be
           | doing for example but we share a static IP).
           | 
           | I honestly didn't follow your Google search/wifi example.
        
             | pm_me_ur_fullz wrote:
             | "It's listening to me!"
             | 
             | and your spouse's typed in search queries
             | 
             | and your playstation
             | 
             | and your smart coffee machine
             | 
             | and alexa actually is
             | 
             | thats all without getting me started on the analytics
             | networks in other apps sharing with the same data brokers
             | between apps
        
             | virgilp wrote:
             | > using IP is not good enough
             | 
             | Define "not good enough"(for who is it not good enough? for
             | advertisers?).
             | 
             | I used to work on a project linking cookies together in
             | anonymous profiles (for advertising), initially the plan
             | was to separate household/individual profiles using
             | different heuristics, but I think eventually it turned out
             | that nobody cares that much - the advertisers just wanted a
             | rough cross-device profile, not perfect accuracy. I mean
             | sure, for marketing purposes (as well as engineering/
             | "performance" reasons) you needed figures to brag about
             | accuracy and whatnot. But it's really really hard to get
             | those figures right, and ultimately the thing that will
             | convince advertisers is "using cross-device profile
             | improved conversion rate by 10%"; everything else is a
             | detail in comparison.
        
               | annoyingnoob wrote:
               | The title and focus here is on _user tracking_ , which is
               | the goal I'm commenting about.
        
             | icedchai wrote:
             | It's "good enough" because it's still better than ads that
             | are not-targeted at all.
        
             | kube-system wrote:
             | An IP address wouldn't be good enough to perform
             | identification, but it would likely be good enough to
             | perform reidentification. Most people on a network are not
             | clearing their browser caches at the same exact time. The
             | real power here is in using IP addresses in combination
             | with other fingerprinting techniques.
        
               | annoyingnoob wrote:
               | > The real power here is in using IP addresses in
               | combination with other fingerprinting techniques.
               | 
               | I didn't think that was anything new.
               | 
               | It also sounds like a possible path for exploit, kind of
               | like not requiring a password when you call voicemail
               | from your own phone (one could spoof your number as the
               | caller id and access your voicemail without a password).
        
               | sixothree wrote:
               | This sounds like circular FUD to me.
        
               | annoyingnoob wrote:
               | How so?
        
             | parhamn wrote:
             | > I honestly didn't follow your Google search/wifi example.
             | 
             | Sorry, wasn't clear there. I generally don't think the
             | 'match' thresholds need to be that high for effective (in
             | terms of dollars earned vs spent) targeted ads.
             | 
             | In the google example: if a friend comes over a searches
             | for designer watches odds are the whole IP/home is a decent
             | enough target for designer watches.
        
       | Zenst wrote:
       | Most users will at the very least use two IP addresses - home
       | broadband and mobile SIM broadband.
       | 
       | Then you have wifi hotspots, friends wifi. The average user uses
       | many IP's and not limited to the range of one ISP.
       | 
       | SO whilst you can fingerprint devices and usage patterns, the IP
       | address will by itself be useless to identify such users, it may
       | well augment a little but is no solution.
       | 
       | But then IPv4 shares many IP addresses across mobile and
       | broadband users in various ways. Most do not have a fixed IP and
       | even those that do, do not have a fixed IP upon their mobile data
       | activities - unless they VPN into their home broadband. Though if
       | they use service VPN offerings, then another layer of IP ranges.
       | 
       | So the potential towards false assertions based upon an IP and
       | user usage may well trip up and fail. Imagine using an ISP with
       | dynamic IP and the next user of that IP uses it for crime, well
       | with some bad logging and aggressive association, you can
       | mislabel somebody for a crime they did not commit.
       | 
       | Roll on IPv6 and with that, mobile carriers would of been the
       | obvious benefit of that, yet I'm not aware of any progressing
       | that in any timely manner and chug along using a pool of IPv4 and
       | various tricks to make those cater for many.
        
         | scared2 wrote:
         | However in this paper the authors tried to show is it's
         | stability over time. So overall their "findings" indicate ip
         | addresses should not be overlooked in privacy protection.s they
         | stated as follows:
         | 
         | "... Over time, a same device communicates with our server
         | using a set of distinct IP addresses, but we find that devices
         | reuse some of their previous IP addresses for long periods of
         | time. We call this IP address retention."
        
           | Zenst wrote:
           | As somebody who tracks and logs their IP's given via ISP etc,
           | I can attest, not that distinct over time.
        
       | jentulman wrote:
       | from the abstract....
       | 
       | "In this paper, we study the stability of the public IP addresses
       | a user device uses to communicate with our server. Over time, a
       | same device communicates with our server using a set of distinct
       | IP addresses, but we find that devices reuse some of their
       | previous IP addresses for long periods of time. We call this IP
       | address retention and, the duration for which an IP address is
       | retained by a device, is named the IP address retention period.
       | We present an analysis of 34,488 unique public IP addresses
       | collected from 2,230 users over a period of 111 days and we show
       | that IP addresses remain a prime vector for online tracking. 87 %
       | of participants retain at least one IP address for more than a
       | month and 45 % of ISPs in our dataset allow keeping the same IP
       | address for more than 30 days."
        
       | AndyMcConachie wrote:
       | I worked on a document a few years back on anonymizing IP
       | addresses. If you find yourself in a situation where you need to
       | balance anonymization of IP addresses with research needs, this
       | paper may be useful.
       | 
       | https://www.icann.org/en/system/files/files/rssac-040-07aug1...
        
       ___________________________________________________________________
       (page generated 2020-04-15 23:00 UTC)