FLoC privacy analysis
In a previous article, I talked about a new set of âPrivacy Preserving Advertisingâ technologies, which are intended to enable advertising without compromising privacy. This article deals with one of these propositions –Federated Cohort Learning (FLoC)-which Chrome is currently trial. The idea behind FLoC is to make it possible to target advertisements based on users’ interests without revealing their browsing history to advertisers. we have managed a detailed FLoC confidentiality analysis. This article presents a summary of our findings.
In the current web, trackers (and therefore advertisers) associate a biscuit with each user. Each time a user visits a website with a built-in tracker, the tracker obtains the cookie and can thus compile a list of the sites that a user visits. Advertisers can use the information obtained from tracking browsing history to target advertisements that are potentially relevant to a particular user’s interests. The obvious problem here is that it means advertisers are learning everywhere you go.
FLoC replaces this cookie with a new âcohortâ identifier that does not represent a single user but a group of users with similar interests. Advertisers can then create a list of sites that all users in a cohort visit, but not an individual user’s history. If the interests of users in a cohort are truly similar, this Cohort ID can be used for ad targeting. Google executed an experiment with FLoC; from there they stated that FLoC provides 95% of conversion rate per dollar vs. interest-based advertising targeting using tracking cookies.
Our analysis shows several privacy issues that we believe need to be addressed:
Cohort IDs can be used for tracking
While a given cohort is relatively large (the exact size is still under discussion, but these groups will likely include thousands of users), that doesn’t mean they can’t be used for tracking. Since only a few thousand people will share a given Cohort ID, if trackers have a significant amount of additional information, they can reduce the number of users very quickly. This can happen in several ways:
Browser fingerprints
Not all browsers are the same. For example, some people use Chrome and others Firefox; some people are on Windows and some on Mac; some people speak English and others speak French. Each user-specific variation item can be used to distinguish between users. When combined with a FLoC cohort that has only a few thousand users, a relatively small amount of information is needed to identify an individual person or at least reduce the FLoC cohort to a few people. Let’s give an example using plausible numbers. Imagine you have a fingerprinting technique that divides people into about 8,000 groups (each group here is a little bigger than a zip code). This is not enough to identify people individually, but if it is combined with FLoC using cohort sizes of around 10,000, then the number of people in each fingerprint group / FLoC cohort pair will be very small, potentially as small as one. While there may be larger groups that cannot be identified in this way, it is not the same as having a system without individual targeting.
Multiple visits
People’s interests are not constant and neither are their FLoC IDs. Currently, FLoC IDs appear to be recalculated every week or so. This means that if a tracker is able to use other information to link user visits over time, it can use the combination of FLoC IDs in week 1, week 2, etc. to distinguish individual users. This is of particular concern as it even works with modern anti-tracking mechanisms such as Firefox Total Cookie Protection (TCP). TCP is intended to prevent trackers from correlating visits between sites, but not multiple visits to the same site. FLoC restores cross-site tracking even if users have TCP enabled.
FLoC discloses more information than you want
With cookie-based tracking, the amount of information a tracker obtains is determined by the number of sites on which it is integrated. In addition, a site that wants to learn about user interests must itself participate in user tracking across a large number of sites, work with a reasonably large tracker, or work with other trackers. As part of a permissive cookie policy, this type of tracking is simple using third-party cookies and synchronization of cookies. However, when third-party cookies are blocked (or isolated by site in TCP), it is much more difficult for trackers to collect and share information about a user’s interests on sites.
FLoC undermines these more restrictive cookie policies: because FLoC IDs are the same across all sites, they become a shared key that trackers can associate with data from external sources. For example, it is possible for a tracker with a significant amount of first party data of interest to operate a service that simply answers questions about the interests of a given FLoC ID. For example: “Do people with this cohort ID like cars?” All a site has to do is call the FLoC APIs to get the cohort ID, and then use it to find information in the service. Additionally, the ID can be combined with the fingerprint data to ask “Do people who live in France have Macs, use Firefox and have this ID like cars?” The end result here is that any site will be able to learn a lot about you with much less effort than it needs today.
FLoC countermeasures are insufficient
Google has proposed several mechanisms to resolve these issues.
First, sites have the option of whether or not to participate in FLoC. In the current Chrome experiment, sites are included in the FLoC calculation if they do ad-type things, either “load ad-related resources” or call the FLoC APIs. The eventual inclusion criteria are unclear, but it seems likely that any site that includes advertising will be included in the calculation by default. Sites can also disable FLoC entirely using the HTTP Permissions-Policy header, but it seems likely that many sites will not.
Second, Google itself will remove FLoC cohorts that it says are too closely correlated with “sensitive” topics. Google provides details in this white paper, but the basic idea is that they will look to see if users in a given cohort are significantly more likely to visit a set of sites associated with sensitive categories, and if so, they will simply return a Empty cohort ID for this cohort. Likewise, they say they will remove sites they think are sensitive from the FLoC compute. These defenses seem to be very difficult to execute in practice for several reasons: (1) the list of sensitive categories may be incomplete or people may not agree on sensitive categories, (2) there may be other sites that correspond to sensitive sites but are not themselves sensitive, and (3) smart trackers may be able to learn sensitive information despite these controls. For example: English speaking users with FLoC ID X may no longer be likely to visit a sensitive type A site, but French speaking users are.
While these mitigation measures seem useful, they mostly appear to be marginal improvements and do not address the basic issues described above, which we believe require further study by the community.
summary
FLoC is based on a compelling idea: enable ad targeting without putting users at risk. But the current design has a number of privacy properties that could create significant risks if it were to be widely deployed in its current form. It is possible that these properties can be corrected or mitigated – we suggest a number of potential avenues for our analysis – further work on FLoC should focus on resolving these issues.
To learn more about it:
Build a more privacy-friendly advertising ecosystem
The future of advertising and privacy