The resurgence in interest in AI has been fueled by breakthroughs in hardware and machine learning, specifically deep learning and reinforcement learning. Both methods are notoriously data-hungry techniques and the importance of generating or collecting (labeled) data to train these algorithms cannot be minimized.
This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared.
Ben Lorica
This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared. Regulators in many localities have introduced landmark data privacy regulations: for example Europe (GDPR) and California (Consumer Privacy Act) have placed concepts such as transparency, “user control” and “privacy-by-design” at the forefront for companies wanting to deploy data products.
A typical organization uses data to drive two primary activities: improve decision making (through business intelligence) or enable automation (using machine learning and AI). It turns out that there is an emerging set of privacy-preserving methods and tools for building systems that rely on business intelligence and machine learning.
In many settings business intelligence relies on a database. A collaboration between Uber and UC Berkeley’s RISE Lab has resulted in an open source tool that lets analysts submit queries and get results that adhere to state-of-the-art differential privacy (a formal guarantee that provides robust privacy assurances). Their open source tool paves the way for privacy-preserving business intelligence within many organizations. More impressively, differential privacy can scale to millions of devices that generate data in real time. Apple, Microsoft and Google have built privacy-preserving business analytics for services that support mobile phones and smart meters.
Researchers and entrepreneurs are actively building privacy-preserving methods and tools for AI. The machine learning community has long acknowledged that simple data anonymization techniques can place users privacy at risk (an early example is the de-anonymization attacks on the Netflix Prize). Here are some recent privacy-preserving techniques in machine learning:
As users and regulators stress the importance of data privacy companies, the data community is rallying to build privacy-preserving tools for the AI systems of the near future.