The Interplay Between Privacy, Machine Learning and Artificial Intelligence

Interview with Ben Lorica, Chief Data Scientist, O’Reilly Media

The report identifies current trends in AI research. What do you see driving innovation at the moment?

The resurgence in interest in AI has been fueled by breakthroughs in hardware and machine learning, specifically deep learning and reinforcement learning. Both methods are notoriously data-hungry techniques and the importance of generating or collecting (labeled) data to train these algorithms cannot be minimized.

This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared.
Ben Lorica

This is an era when data privacy has become a central issue for both users and regulators. Users are demanding more transparency and control over how data is collected, stored, used and shared. Regulators in many localities have introduced landmark data privacy regulations: for example Europe (GDPR) and California (Consumer Privacy Act) have placed concepts such as transparency, “user control” and “privacy-by-design” at the forefront for companies wanting to deploy data products.

How do organizations build analytics into services in an age when data privacy has become critical?

A typical organization uses data to drive two primary activities: improve decision making (through business intelligence) or enable automation (using machine learning and AI). It turns out that there is an emerging set of privacy-preserving methods and tools for building systems that rely on business intelligence and machine learning.

In many settings business intelligence relies on a database. A collaboration between Uber and UC Berkeley’s RISE Lab has resulted in an open source tool that lets analysts submit queries and get results that adhere to state-of-the-art differential privacy (a formal guarantee that provides robust privacy assurances). Their open source tool paves the way for privacy-preserving business intelligence within many organizations. More impressively, differential privacy can scale to millions of devices that generate data in real time. Apple, Microsoft and Google have built privacy-preserving business analytics for services that support mobile phones and smart meters.

Researchers and entrepreneurs are actively building privacy-preserving methods and tools for AI. The machine learning community has long acknowledged that simple data anonymization techniques can place users privacy at risk (an early example is the de-anonymization attacks on the Netflix Prize). Here are some recent privacy-preserving techniques in machine learning:

Federated learning: introduced by Google, it allows for training a centralized machine learning model without sharing data, and thus fits nicely into services on mobile devices.
Differential privacy: the interplay between differential privacy and machine learning continues to be an active research area and researchers are beginning to examine deep learning models that adhere to differential privacy.
Homomorphic encryption: this is a nascent area whose goal is to develop a class of tools that allow computation of complex models over encrypted data. There has been preliminary work in computer vision and speech technologies.
Decentralization: this is an area driven mainly by startups who are looking to use blockchains, distributed ledgers and incentive structures that use cryptocurrencies. For example, Computable Labs is building open source, decentralized infrastructure that will allow companies to securely share data and models. They want to “make blockchain networks compatible with machine learning computations”.

As users and regulators stress the importance of data privacy companies, the data community is rallying to build privacy-preserving tools for the AI systems of the near future.

What related writings are there on this subject?

Data collection and data markets in the age of privacy and machine learning
Building tools for the AI applications of tomorrow: we’re currently laying the foundation for future generations of AI applications, but we aren’t there yet
What machine learning means for software development