So far, I have published articles on how to steal information from web pages, browser password managers and form input through network trackers.
Today, I will continue to publish my recently discovered how to collect private data by using third-party scripts. The specific process is to find personal identifiers from websites and track users by logging in to Facebook and other similar social login APIs. Specifically, I found two types of vulnerabilities :
1. Seven third-party websites visited and abused Facebook users' data;
2. A third party uses its own Facebook "application" to track users on the network;
Vulnerability 1: A third party took advantage of the access rights granted by Facebook to its website.
For security reasons, sometimes users will set the login password very complicated. So Facebook login and other social login systems will simplify the user's account creation process by reducing the number of passwords to remember. However, social login can also bring risks. For example, Cambridge Analytica, which has been making a lot of noise during this period, was found to use a personality test application to collect user data, which uses Facebook's login function. Besides, I also found other risks. For example, when users grant websites access to their social media materials, they not only trust the website, but also trust the third party embedded in the website.
At present, I have found a total of seven scripts that use the first party's Facebook access to collect Facebook user data . These scripts are currently embedded in 434 of the top 1 million sites. I will explain in detail how to find these scripts in Appendix 1. Most of these scripts will get user ID, as well as additional profile information such as email and user name. At present, I am not sure whether the first party knows this specific data access .
User IDs collected through Facebook API are specific to some websites (which Facebook gives permission to). Although this will limit the possibility of cross-site tracking, these application-wide user IDs can be used to retrieve the global Facebook ID, user profile photos and other public profile information, which can be used to identify and track users on websites and devices .
The above figure shows that they are abusing browser auto-filling to collect users' e-mail addresses, and the script loaded by Opentag tag manager includes accessing Facebook API and sending user ID to https://c.lytics.io/c/1299?. Code snippet in the form of fbstatus = [...] & fbuid = [...]&[...]. This snippet seems to be a modified version of an example code snippet on the website lytics.github.io The original code snippet not only captures Facebook events, but also seems to provide instructions for the first party to collect Facebook user ID and login status.
Although I have observed that these scripts can be used to query the Facebook API and save the user's Facebook ID, due to the confusion of their codes and the limitation of measurement methods, we cannot verify whether these codes are sent to their servers.
Third-party scripts can get data directly from Facebook API, and the code fragment above comes from OpenTag script, which will leak the user's Facebook ID to Lytics, a personalized marketing and customer data platform. For the convenience of explanation, I added comments and simplified them. The original script is available here. This script continuously checks the existence of Facebook API (obtained through window.FB). Once the user logs on to Facebook, the tracking script can quietly query the user's login status. The response to the login status query contains the user's Facebook ID, and then the script parses the ID from the response and sends it back to the remote server.
Although I can't sum up how all these trackers use the information they collect, I can check their advertising sales materials to understand how they use this information. OnAudience, Tealium AudienceStream, Lytics and ProPS all provide some form of customer data platform, which will collect data and help hackers turn data into profits. Forter provides identity-based fraud prevention for e-commerce websites, and Augur provides cross-device tracking and consumer identification services. At present, I am not sure about the company that owns the domain name ntvk1.ru.
Vulnerability 2: Tracking network users through Facebook login service
Some third parties use Facebook login function to authenticate users on many websites, such as Disqus, a third-party social comment system, which mainly provides comment hosting service for website owners. However, hidden third-party trackers can also use Facebook login to anonymize users, so as to carry out targeted advertising push. This is an invasion of privacy, because it does not have the user's consent. But the premise is, how do these hidden trackers find out that users log on to Facebook? I think this tracker is directly embedded in the first party directly visited by the user. This guess was discovered by me in Bandstown. To make matters worse, the attacker did it in order to let malicious websites embed iframe in Bandstown to identify users.
Bandstown provides users with very comprehensive data. Many singers are using Bandstown to manage performance dates and various activities. This service can automatically help singers publish performance information on their Facebook homepage, which not only greatly simplifies the process of transmitting information from singers to fans, but also provides suggestions for the promotion time of concerts.
For fans, in order to follow their idols, users need to log on to Facebook and let the Bandsintown Facebook application access their personal data, cities, preferences, email addresses and music activities. At this point, Bandsintown can access the necessary authentication tokens to access Facebook account information.
In addition, Bandsintown provides an advertising service called "Amplified", which appears on many top music-related websites, including lyrics.com, songlyrics.com and lyricsmania.com. When a Bandstown user browses an "Amplified" advertising service embedded in Bandstown, the advertising script embeds an invisible iframe, which connects to Bandstown's Facebook application using a previously established authentication token and obtains the user's Facebook ID. Iframe then passes the user ID back to the embedded script.
We find that iframe injected by Bandsintown will pass the user's information to the embedded script indiscriminately. Therefore, any malicious website may use their iframe to identify visitors. At present, I have informed Bandsintown of this vulnerability and confirmed that it has now been fixed.
The unexpected exposure of Facebook data to third parties is not due to problems with Facebook's login function. On the contrary, this is due to the lack of security boundary between first-party and third-party scripts in today's network. Nevertheless, Facebook and other social login providers can take some measures to prevent it, such as strictly auditing the use of API and how they access social login data. Facebook can also disable the profile picture and the global Facebook id through the application-wide user ID. In fact, Facebook announced its anonymous login four years ago.
In the articles mentioned at the beginning, I listed three sites (Fiverr. com, bhphotovideo.com and mongodb.com) with embedded scripts matching the URL patterns given above. Even if these scripts come from the same or similar url, third-party scripts may contain different content when loading different sites. After analysis, Forter scripts embedded in fiverr.com and bhphotovideo.com do not include the function of accessing Facebook data. On mongodb.com, we only saw the existence of an Augur script. I have published an updated list of websites in which the API has confirmed the function of accessing Facebook data.
In addition, I listed the Lytics script (https://c.lytics.io/static/io.min.js) as the reason for Facebook API access in previous articles. Although this script is used to send the Facebook user ID to Lytics(c.lytics.io), the code to access the Facebook API is provided in the OpenTag script, as described above. The code fragment in the OpenTag script responsible for accessing Facebook user data may be configured by the first party, so I think the first party may not know that the data is accessed.
 In my article, I used the term "vulnerability" to refer to the problems caused by unsafe design practices in today's network, rather than the commonly understood vulnerabilities in computer security.
 In this article, although I take Facebook login as an example, the vulnerabilities I described may exist in most social companies and mobile devices. In fact, for example, Google Plus API and Russian social media website VK can also get scripts of user identifiers.
 In order to better understand the degree of integration between the third party and the first party, I classified the scripts according to the use of the first-party application ID (or AppId) they provided to Facebook during the login initialization stage to determine the website. If including the application ID and initialization code of the website in the third-party library indicates closer integration, the first party may need to configure a third-party script to indicate that they access the Facebook SDK. Although the application ID is no secret, I think the lack of application ID means loose integration, because the first party may not know the access. In fact, all the scripts in the above article use the same operation when embedding a simple test page, and there is no business relationship before.
 The following signs may indicate that the first party has accessed Facebook data:
1) The third party actively initiates the Facebook login process instead of passively waiting for the login to occur;
2) The third party includes the unique App ID of the website it embeds. The seven scripts listed above neither start the login process nor contain the application ID of the website.
However, it is difficult to determine the exact relationship between the first party and the third party.
 By querying the Graph API of Facebook or retrieving the user's profile photos (even without verification), the ID of the application scope can be resolved into the real user profile information. When security researchers reported this vulnerability to Facebook, Facebook replied as follows:
"This is an intentional behavior when we are developing products. Although we don't think this is a security loophole, we do need to take control measures to monitor and mitigate abuse. "
According to reports, the Facebook interface with similar control functions has obtained the public personal data of 2 billion Facebook users. Please note that although the terminal discovered by researchers is no longer running in public, the following terminal will still be redirected to the user's profile page: https://www.facebook.com/[app _ scoped _ id].
Appendix 2: Research and Analysis Methods
In order to study the abuse of social login API, I extended OpenWPM to simulate that users have been authenticated on all websites and give Facebook full access to the SDK. I added some tools (such as "windows. fb") to monitor the use of the Facebook SDK interface. In addition, since I didn't inject the user's identity into the page in other ways, any leaked personal data must be queried from my spoofed API.
I grabbed 5 million websites from Alexa's 1 million websites in June 2017, and adopted the following sampling strategy: I visited all the top 15,000 websites, and randomly selected 15,000 websites with Alexa's ranking range of 15,000 to 100,000, and 20,000 websites with Alexa's ranking range of 100,000 to 1,000,000. This sampling combination enables us to observe the difference between attackers' attacks on high-traffic and low-traffic websites. Among these 50,000 websites, we visited 6 pages, namely the home page and a group of other 5 pages randomly selected from the internal links of the home page.