Yes, Data Privacy and Artificial Intelligence are Compatible – insideBIGDATA
In this special guest feature, Leif-Nissen Lundbaek, is CEO of XAIN, makes a compelling case that, with all the fear surrounding hacks and data breaches, data privacy and artificial intelligence are in fact compatible, through Federated Learning. Academically trained as a mathematician, Leif-Nissen Lundbaek’s work focuses mainly on algorithms and applications for privacy-preserving artificial intelligence. In 2017, Leif developed the eXpandable AI Network (XAIN) as a cyber-security protocol that combines AI with privacy paradigms. Since then, his company has won multiple awards, such as the Porsche’s first-ever Innovation Contest, and worked successfully with many blue-chip companies. Leif-Nissen received his M.Sc. in Software Engineering at The University of Oxford with distinction as well as an M.Sc. in Mathematics at Heidelberg University.
Artificial intelligence is the way of the future. In its ultimate (and most benevolent) form, AI will be able to fight climate change, scour the galaxy, fight disease, and even drastically prolong human life. In addition to the vast opportunities those realities can offer if they come to pass, they will present humanity with all kinds of hurdles and moral dilemmas that were previously merely fodder for philosophy professors. But artificial intelligence is the way of the present, too.
Though it has not yet peaked, AI today is embedded in GPS systems and smartphone apps; it helps people edit photos, make music, and write novels. Banks deploy AI bots to serve customers online. Already, artificial intelligence is thriving, and already, it is causing decision-makers in government and industry a major headache stemming in no small part from the data privacy challenges it introduces.
The adoption of AI applications in enterprise, in particular, is a hot topic, largely because many companies are sitting on loads of data with the potential to provide valuable insights that they have been unable to milk due to privacy concerns. AI promises to help them turn that data into actionable insights that they can use to their advantage. The challenge comes with doing so in a manner that doesn’t compromise data privacy.
AI applications use machine learning to train on data to compute an AI model, which can then be used to make or support decisions. A trained AI model can evaluate a new data input (say, for example, a purchase order) and make predictions (say, whether the purchase order is fraudulent). AI can also rank input or generate new input entirely — for example, AI can generate an image of the human face of a person who never lived.
A look into the mechanics of how data is trained in AI, though, reveals serious challenges surrounding data privacy that businesses must overcome before benefiting from its technology. Simply put, most of today’s machine learning methods aren’t able to train AI accurately enough while simultaneously preserving data privacy. In Europe, ignoring privacy concerns while training AI models is actually illegal. Laws vary from state to state across the U.S., but the consensus is that data privacy issues will have to be addressed more thoroughly in the years to come.
Considering the way data has been trained in AI in the past, these concerns make total sense.
Companies often house data sets that contain personal information pertaining to different clients. The only legal way to learn insights from each of these data sets in Europe, per EU General Data Protection Regulation rules (GDPR), is to train each data set onto a separate AI model. The problem with that method is it prevents the company from learning over that data from all these various data sets, as they remain separate throughout the process. Ideally the company would train the data sets jointly onto one AI model, but that would mean local, personal data from each set would be visible to others or merged without consent with other personal data — which is why it’s illegal in the EU. Most U.S. states have data privacy regulations of their own. Without getting too much into the weeds, anonymizing local data sets is possible, but still risky and costly.
So what is a company to do when it wants to use AI in innovative ways? That’s where Federated Learning technology comes to the rescue. Federated Learning uses complex mathematics to sidestep the issue of data privacy.
For simplicity’s sake, though, let’s remove the mathematical aspects of machine learning to explain how Federated Learning works.
Suppose a company within Europe has 100 local data sets, each containing the details of one customer. The company wants to use AI to determine the average height of its customers, but it can’t combine the data sets because doing so would violate GDPR. They could anonymize the data sets before combining them, but it would be too easy to identify most of the customers (few customers have exactly the same height) based on their distinct heights, so they wouldn’t really be “anonymizing” anything. Federated Learning addresses this problem by providing a box into which the various heights can be submitted as private inputs. The box determines the average of the heights without revealing anything about the heights, individually, to any party.
Through Federated Learning, companies will be able to learn over the collaboration of the data sets without compromising any customers data. Essentially, that translates into artificial intelligence that inherently protects data privacy.
As artificial intelligence comes to the forefront of industry, data privacy, too, will require heightened attention. AI is on track to quite literally revolutionize the world, for better or for worse. It’s up to AI pioneers to proceed in a prudent fashion, and governments to regulate intelligently, to ensure the world reaps the benefits of AI while avoiding its perils. The first hurdle for AI adoption comes in the form of data privacy. Federated Learning proves we can train AI models while keeping data private. Hopefully, it’s a sign we can master AI just as we have mastered the technologies that came before it – while also preserving societal values such as privacy.
Sign up for the free insideBIGDATA newsletter.