The UK government has enlisted Palantir, a US big data firm founded by Peter Thiel, and Faculty, a startup that specializes in professional data science strategy, software and skills training to combat the spread of COVID-19. While this may mobilize concern about privacy issues, it should be noted that collecting big data including private health data from the general population is necessary for governments to make informed decisions on how to halt the spread of COVID-19, discern which members of society are most vulnerable, and learn which treatment options are the most effective.
Palantir understands there is cause for concern with user privacy which is why they released an outline of their best practices for using data during a crisis. Palantir stated: “Knowing how to competently apply data science to the right set of problems will serve as a critical asset for augmenting and enhancing comprehensive strategies to battle this public health crisis”, this is undeniably true.
They also stated the following which is an acknowledgment of the precarious risk society embarks upon accessing this type of big data sharing: “Rich data sources often inspire unanticipated — even rogue — analyses. Establish and enforce collective ground rules on how the data should be used and who should have what levels of access to, and use of, data. Misuse of data can result in public mistrust in institutions. Even the most well-meaning of problem solvers sometimes are blinded to the risks of the solutions they create.”
What type of data is the UK government collecting? Currently, the appropriate data which is needed to tackle the COVID-19 challenge. As reported by Guardian, current anonymized data includes gender, protected health information, Covid-19 test results, the contents of people’s calls to the National Health Service (NHS), health advice line 111 and clinical information about those in intensive care.
While data privacy should be anonymized so it can never be traced to a specific individual, we need this data for machine learning systems to analyze. Deep learning systems use this type of big data to identify patterns, and data points which are overlooked by humans. Something as trivial as gender, may reveal important insights, an example could be that diabetic men are a more vulnerable segment of the population than diabetic women. Certain treatment options may work better for different age groups, genders, genetic backgrounds, etc.
Instead of being vilified we should give the UK government the benefit of the doubt. This type of data collecting, and data sharing efforts by all facets of the health care system, is something that should be maintained for the long-term. This could serve us in the future for both fighting future pandemics, as well as regular health issues, cancer, and other physiological ailments.
Currently, the project uses a “pseudo NHS number” to cross-match large datasets, including a master patient index, an existing NHS resource that uses “social marketing data” to segment the British population into different “types” at the household level. While it remains to be seen if this is the most effective data distribution method, we do have concerns with some aspects of the data collection process.
Currently, phone location data is being collected. While narrowing down the data to a postal code may be appropriate, it is unnecessary to directly pinpoint the exact source of the phone call, as this information cannot be anonymized or randomized. This could cause sick individuals to fear using the phoneline which may result in unnecessary deaths from those who most need help.
British citizens should be alarmed regarding the phone location datapoint which is unnecessary to effectively train a deep learning algorithm but can be directly used to track an individual.
Should the UK government continue in its path to collect this type of big data, and fix the issues outlined above, as well as other privacy/user rights issues of which we are unaware of, it may be appropriate for the UK to enlist the assistance of the European Union to gather similar datapoints from their respective populations. After all, how deep learning works is the more data that is collected, the more effective the algorithm. This would go a long way in bridging nations after the closure of international borders.
We urge careful analysis, and the assistance of a non-profit entity to ensure that the UK government does not abuse the information it gathers. Nonetheless, it should be acknowledged that this is an important step towards fighting COVID-19.