The Interplay Between Privacy, Machine Learning and Artificial Intelligence

Interview with Ben Lorica, Chief Data Scientist, O’Reilly Media

The report identifies current trends in AI research. What do you see driving innovation at the moment?

The resurgence in interest in AI has been fueled by breakthroughs in hardware and machine learning, specifically deep learning and reinforcement learning. Both methods are notoriously data-hungry techniques and the importance of generating or collecting (labeled) data to train these algorithms cannot be minimized.

This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared.

Ben Lorica

This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared. Regulators in many localities have introduced landmark data privacy regulations: for example Europe (GDPR) and California (Consumer Privacy Act) have placed concepts such as transparency, “user control” and “privacy-by-design” at the forefront for companies wanting to deploy data products.

How do organizations build analytics into services in an age when data privacy has become critical?

A typical organization uses data to drive two primary activities: improve decision making (through business intelligence) or enable automation (using machine learning and AI). It turns out that there is an emerging set of privacy-preserving methods and tools for building systems that rely on business intelligence and machine learning.

In many settings business intelligence relies on a database. A collaboration between Uber and UC Berkeley’s RISE Lab has resulted in an open source tool that lets analysts submit queries and get results that adhere to state-of-the-art differential privacy (a formal guarantee that provides robust privacy assurances). Their open source tool paves the way for privacy-preserving business intelligence within many organizations. More impressively, differential privacy can scale to millions of devices that generate data in real time. Apple, Microsoft and Google have built privacy-preserving business analytics for services that support mobile phones and smart meters. 

Researchers and entrepreneurs are actively building privacy-preserving methods and tools for AI. The machine learning community has long acknowledged that simple data anonymization techniques can place users privacy at risk (an early example is the de-anonymization attacks on the Netflix Prize). Here are some recent privacy-preserving techniques in machine learning: 

As users and regulators stress the importance of data privacy companies, the data community is rallying to build privacy-preserving tools for the AI systems of the near future. 

What related writings are there on this subject?