A cybersecurity incident at an analytics provider Mixpanel announced just hours before the US Thanksgiving holiday weekend could set a new standard for how not to report a data breach.
To recap: In a bare bones blog post Last Wednesday, Mixpanel CEO Jen Taylor announced that the company had identified an unspecified security incident on November 8 that affected some of its customers, but did not say how or how many were affected, only that Mixpanel had taken a series of security actions to “eliminate unauthorized access.”
Mixpanel CEO Jen Taylor did not respond to multiple emails from TechCrunch, which included more than a dozen questions about the company’s data breach. We asked Taylor if the company had received any communication from the hackers, such as a request for money, along with other specific questions about the breach, including whether Mixpanel employee accounts were protected by multi-factor authentication.
One of its affected customers is OpenAI, which published his own blog post two days later, confirming what Mixpanel had not explicitly stated in its own post, that customer data had been taken from Mixpanel’s systems.
OpenAI said it was affected by the breach because it relied on software provided by Mixpanel to help understand how OpenAI users interact with certain parts of its website, such as its developer documentation.
OpenAI users affected by the Mixpanel breach are likely to be developers whose own applications or websites rely on OpenAI products to function. OpenAI said its stolen data included the user’s supplied name, email addresses, their approximate location (such as city and state) based on their IP address, and some identifiable device data, such as operating system and browser version. Some of this information is the same kind of data Mixpanel collects from users’ devices as they use apps and browse websites.
For his part, OpenAI spokesperson Niko Felix told TechCrunch that the compromised data obtained by Mixpanel “did not contain identifiers such as the Android Advertising ID or Apple’s IDFA,” which might have made it easier to personally identify specific OpenAI users or combine their OpenAI activity with usage from other apps and websites.
OpenAI said in its blog post that the incident did not directly affect ChatGPT users, and it has stopped using Mixpanel as a result of the breach.
While the details of the breach remain limited, this incident is drawing new scrutiny on the data analytics industry, which benefits from collecting reams of information about how people use websites and apps.
How Mixpanel tracks your taps, clicks and screen tracking
Mixpanel is one of the biggest web and mobile analytics companies that you may never have heard of unless you work in app development or marketing. According to its website, Mixpanel has 8,000 enterprise customers — one less now, after OpenAI’s early exit.
As each Mixpanel customer has potentially millions of its own users, the number of ordinary people whose data was taken in the breach could be significant. The type of data breached is likely to differ from each Mixpanel customer, depending on how each customer configured their data collection and how much user data they collected.
Companies like Mixpanel are part of a thriving industry that provides tracking technologies that allow companies to understand how their customers and users interact with their apps and websites. As such, analytics companies can collect and store vast amounts of information, including billions of data points, about regular consumers.
For example, an app maker or website developer can embed a piece of code from an analytics company like Mixpanel into their app or website to gain this visibility. For the app user or website visitor, it’s like having someone unknowingly watching you as you browse a website or use an app, constantly sharing every click or tap, swipe and link with the company developing the app or website.
In the case of Mixpanel, it’s easy to see the types of data Mixpanel collects from the apps and websites its code is embedded into. Using open source tools like Burp Suite, TechCrunch analyzed the network traffic flowing in and out of several apps with Mixpanel code inside — like Imgur, Lingvano, Mobile Neon. In our various tests, we saw varying degrees of information about our device and in-app activity uploaded to Mixpanel while using the apps.
This data may include the individual’s activity, such as opening the application, clicking a link, scanning a page, or logging in with their username and password, for example. This event log data is then attached to information about the user and their device, including device type (such as iPhone or Android), screen width and height, whether the user is on the phone network or Wi-Fi, the user’s carrier, the user’s unique identifier for that service (which can be linked to the user of the app), and the exact time of the event.
The data collected may sometimes include information that should be off limits. Mixpanel admitted in 2018 that its analytics code was inadvertently collecting user passwords.
Data collected by analytics companies is pseudonymized — essentially encoded in a way that does not include identifiable elements, such as an individual’s name. Instead, the information collected is attributed to a unique but seemingly random identifier used in place of an individual’s name. a seemingly more protective way of storing data. But aliases it can be reversed and used to determine people’s true identity. Also, data collected about an individual’s device can be used to uniquely identify that device, known as a “fingerprint,” which can also be used to track that user’s activity across different apps and on the internet.
By tracking what you do on your device across various apps, analytics companies make it easy for their customers to build profiles of users and their activity.
Mixpanel also allows its customers to collect “session replays,” which visually reconstruct how the company’s users interact with an app or website, so the developer can identify bugs and problems. Session replays are intended to exclude personally identifiable or sensitive information, such as passwords and credit card numbers, from any user session that is collected, but even this process is not perfect.
By Mixpanel’s own admission, session replays can sometimes occur contains sensitive information that should not have been recorded, but are collected by mistake. Apple cracked down on apps that use screen recording code after TechCrunch exposed the practice in 2019.
To say that Mixpanel has questions to answer about its breach is probably an understatement. Without knowing the specific types of data involved, it’s unclear how large this breach is or how many people may be affected. Maybe Mixpanel doesn’t know yet.
What’s clear is that companies like Mixpanel store huge banks of information about people and how they use their apps, and it’s clear that they’re a hotbed for malicious hackers.
Know more about the Mixpanel data breach? Do you work for Mixpanel or a company affected by the breach? We would love to hear from you. To contact this reporter securely, you can contact using Signal via username: zackwhittaker.1337
See the latest revelations on everything from agent AI and cloud infrastructure to security and more from the flagship Amazon Web Services event in Las Vegas. This video is brought to you in partnership with AWS.
