When it comes to keeping up with emerging cybersecurity trends, the process of staying on top of any recent developments can get quite tedious since there’s a lot of news to keep up with. These days, however, the situation has changed dramatically, since the cybersecurity realms seem to be revolving around two words- deep learning.
Although we were initially taken aback by the massive coverage that deep learning was receiving, it quickly became apparent that the buzz generated by deep learning was well-earned. In a fashion similar to the human brain, deep learning enables an AI model to achieve highly accurate results, by performing tasks directly from the text, images, and audio cues.
Up till this point, it was widely believed that deep learning relies on a huge set of data, quite similar to the magnitude of data housed by Silicon Valley giants Google and Facebook to meet the aim of solving the most complicated problems within an organization. Contrary to popular belief, however, enterprises can harness the power of deep learning, even with access to a limited data pool.
In an attempt to aid our readers with the necessary knowledge to equip their organization with deep learning, we’ve compiled an article that dives deep (no pun intended) into some of the ways in which enterprises can utilize the benefits of deep learning in spite of having access to limited, or ‘small’ data.
But before we can get into the meat of the article, we’d like to make a small, but highly essential suggestion- start simple. However, before you start formulating neural networks complex enough to feature in a sci-fi movie, start by experimenting with a few simple and conventional models, (e.g. random forest) to get the hang of the software.
With that out of the way, let’s get straight into some of the ways in which enterprises can amalgamate the deep learning technology while having access to limited data.
#1- Fine-lining the baseline model:
As we’ve already mentioned above, the first step that enterprises need to take after they’ve formulated a simple baseline deep learning model is to fine-tune them for the particular problem at hand.
However, fine-tuning a baseline model sounds much difficult on paper, then it actually is. The fundamental idea behind fine-tuning a large data set to cater to the specific needs of an enterprise is simple- you take a large data, that bears some resemblance to the domain you function in, and then fine-tune the details of the original data set, with your limited data.
As far as obtaining the large data set is concerned, enterprise owners can rely on ImageNet, which subsequently also provides an easy to fix to any problems of image classification as well. The dataset hosted by ImageNet allows organizations access to millions of images, which are divided across multiple classes of images, which can be useful to enterprises hailing from a wide variety of domains, including, but certainly not limited to images of animals, etc.
If the process of fine-tuning a pre-trained model to suit the specific needs of your organization still seems like too much work for you, we’d recommend getting help from the internet, since a simple Google search will provide you with hundreds of tutorials on how to fine-tune a dataset.
#2- Collect more data:
Although the second point on our list might seem redundant to some of our more cynical readers, the fact of the matter remains- when it comes to deep learning, the larger your data set is, the more likely you are to achieve more accurate results.
Although the very essence of this article lies in providing enterprises with a limited data set, we’ve often had the displeasure of encountering too many “higher-ups,” who treat investing in the collection of data equivalent to committing a cardinal sin.
It is all too often that businesses tend to overlook the benefits offered by deep learning, simply because they are reluctant to invest time and effort in the gathering of data. If your enterprise is unsure about the amount of data that needs to be collected, we’d suggest to plot learning curves, as the additional data is integrated into the model, and observe the change in model performance.
Contrary to the popular belief held by most CSO’s and CISO’s, sometimes the best way to solve problems is through the collection of more relevant, data. The role of CSO and CISO is extremely important in this case because there is always a threat of cyber-attacks. It is found that in 2019, the total global spending on cybersecurity takes up to $103.1 billion, and the number continues to rise. To put this into perspective, let’s consider a simple example- imagine that you were trying to classify rare diamonds, but have access to a very limited data set. As the most obvious solution to the problem dictates, instead of having a field day with the baseline model, just collect more data!
#3- Data Augmentation:
Although the first two points we’ve discussed above are both highly efficient in providing an easy solution to most problems surrounding the implementation of deep learning into enterprises with a small data set, they rely heavily on a certain level of luck to get the job done.
If you’re unable to have any success with fine-tuning a pre-existing data set either, we’d recommend trying data augmentation. The way that data augmentation is simple. Through the process of data augmentation, the input data set is altered, or augmented, in such a way that it gives a new output, without actually changing the label value.
To put the idea of data augmentation into perspective for our readers, let’s consider a picture of a dog. When rotated, the viewer of the image will still be able to tell that it’s an image of a dog. This is exactly what good data augmentation hopes to achieve, as compared to a rotated image of a road, which changes the angle of elevation and leaves plenty of space for the deep learning algorithm to come to an incorrect conclusion, and defeats the purpose of implementing deep-learning in the first place.
When it comes to solving problems related to image classification, data augmentation serves as a key player in the field and hosts a variety of data augmentation techniques that help the deep learning model to gain an in-depth understanding of the different classifications of images.
Moreover, when it comes to augmenting data- the possibilities are virtually endless. Enterprises can implement data augmentation in a variety of ways, which include NLP, and experimentation of GANs, which enables the algorithm to generate new data.
#4- Implementing an ensemble effect:
The technology behind deep learning dictates that the network is built upon multiple layers. However, contrary to popular belief maintained by many, rather than viewing each layer as an “ever-increasing” hierarchy of features, the final layer serves the purpose of offering an ensemble mechanism.
The belief that enterprises with access to a limited, or smaller data set should opt to build their networks deep was also shared in a NIPs paper, which mirrored the belief we’ve expressed above. Enterprises with small data can easily manipulate the ensemble effect to their advantage, simply by building their deep learning networks deep, through fine-tuning or some other alternative.
#5- Incorporating autoencoders:
Although the fifth point we’ve taken into consideration for has received only a relative level of success- we’re still on board with the use of autoencoders in order to pre-train a network and initialize the network properly.
One of the biggest reasons apart from cyber-attacks as to why enterprises fail to get over the initial hurdles of integrating deep learning is because of bad initialization, and it’s many pitfalls. Unsupervised pre-training often leads to poor, or incorrect execution of the deep learning technology, which is where autoencoders can shine.
The fundamental notion behind a neural network dictates the creation of a neural network that predicts the nature of the dataset being input. If you are unsure of how to use an autoencoder, there are several tutorials online that give clear cut instructions.
At the end of the article, we’d like to reimburse what we’ve said throughout the article, with one addition- incorporating domain-specific knowledge into the learning process! Not only does the incorporation of valuable insight to speed up the learning process, but it also allows for the deep learning technology to produce better, and more accurate results.
AI Powered State Surveillance On Rise, COVID-19 Used as Scapegoat
As governments around the world come to terms with the impact of COVID-19, citizens are facing new draconian restrictions which include travel bans and forced quarantine. These are the types of restrictions which are normally associated with totalitarian and/or communist states. The merits of the efficacy of these efforts deserves debate, but there is no debate to be had. Rules are in place, and citizens must obey.
What is more concerning than this complete temporary loss of freedom, are the new laws, and privacy-blocking regulations which are being implemented. We are talking about a complete loss of privacy, and an erosion of basic human rights.
Previously, country wide monitoring using using facing recognition technology via a camera network was used to identify and track the movements of people on terrorist watchlists. That same technology is now being applied to anyone who is deemed infected or has previously traveled within the past two weeks.
Below we highlight some of the Governments that have taken advantage of the current pandemic, to implement systems designed for long-term surveillance.
Russia has taken advantage of COVID-19 to fast-forward their plans to blanket the country with a massive facial recognition system. The system was rolled out earlier this year with major public backlash. Privacy advocates were filing lawsuits to attempt to reduce the amount of potential government surveillance.
With fear at its side President Vladimir Putin was able to curb this backlash. Russia argued that public safety was the number one concern and that Facial Recognition does not invade peoples privacy. These proceedings took place on the pretext of the COVID-19 outbreak.
Today the surveillance network has been activated. The 170,000 camera-wide system is now programmed to track the movements of anyone who leaves quarantine or self-isolation. Now that the system is in place, the odds of rolling back the technology are next to none.
With this technology is place, Russia which has a proven history of tracking and killing uncooperative journalists, has increased their arsenal to monitor the movements of anyone who makes a disparaging comment regarding President Vladimir Putin or the state.
Russia has solidified their totalitarianism control, other countries are following in these footsteps.
Less than two weeks ago the Israeli government approved tracking the movement of people who are deemed ‘suspected’ of infection of COVID-19. Tracking would be via the data on your mobile phone. Most people carry their mobile phones on them at all times, which enables governments to be informed of your current location.
What is especially concerning is that this bill was passed in an overnight sitting of the cabinet, bypassing parliamentary approval.
This enables the full-time tracking of all Israeli citizens. The Association for Civil Rights in Israel labeled this as a “dangerous precedent and a slippery slope”. These were also powers which were previously only enabled for counter-terrorism operations.
While currently implemented to track ‘suspected’ COVID-19 patients. With this new law in place, future implementations of the technology are up to interpretation.
What if you were forced to wear a wristband to continuously update the government to your whereabouts? While previously this was used on criminals., March 19th, is when Hong Kong enacted regulations to use it on non-criminals, used precisely to track passengers and to place them in forced quarantine.
The wristbands are connected to a smartphone app and will be used to make sure people actually stay at home. Refusal to wear the wristband or leaving the confines of your home can result in a six month prison sentence.
The long-term precedent of a government entity being able to control the movement and tracking the locations of its citizens should be of concern to anyone who believes in privacy. This is especially true in the case of Hong Kong which has long fought China to retain its independence.
The United States was slow to wake up to COVID-19, due to the initial outrageous claim by the Trump administration that the outbreak was a liberal hoax. The tide has since shifted with the federal government proposing and then reversing from a quarantine of New York state, among other measures to fight the outbreak.
A recent development which should concern us is the United States communicating with tech giants such as Facebook, Google, Twitter, Uber, Apple, and IBM to share data about all of its users. Currently the data is to remain anonymous, but once location tracking is in place, it’s only several lines of code to disable user privacy and anonymity.
The current use case is using machine learning to decipher the location of future hotspots in order to better prepare healthcare workers. Monitoring of this big data should be implemented by a non-profit entity in order to ensure that the data is used specifically to track outbreaks.
This conglomerate of companies has the potential to enable unfiltered access to every single facet of a persons life. From all social media communication, to geo-location tracking. It remains to be seen if democratic values will hold with more government surveillance requests. In the meantime there is cause for concern.
Human Genome Sequencing and Deep Learning Could Lead to a Coronavirus Vaccine – Opinion
The AI community must collaborate with geneticists, in finding a treatment for those deemed most at risk of coronavirus. A potential treatment could involve removing a person’s cells, editing the DNA and then injecting the cells back in, now hopefully armed with a successful immune response. This is currently being worked on for some other vaccines.
The first step would be sequencing the entire human genome from a sizeable segment of the human population.
Sequencing Human Genomes
Sequencing the first human genome cost $2.7 billion and took nearly 15 years to complete. The current cost of sequencing an entire human has dropped dramatically. As recent as 2015 the cost was $4000, now the cost is less than $1000 per person. This cost could drop a few percentage points more when economies of scale are taken into consideration.
We need to sequence the genome of two different types of patients:
- Infected with Coronavirus; but healthy
- Infected with Coronavirus; but poor immune response
It is impossible to predict which data point will be most valuable, but each sequenced genome would provide a dataset. The more data the more options there are to locate DNA variations which increase a body’s resistance to the disease vector.
Nations are currently losing trillions of dollars to this outbreak, the cost of $1000 a human genome is minor in comparison. A minimum of 1,000 volunteers for both segments of the population would arm researchers with significant volumes of big data. Should the trial increase in size by one order of magnitude, the AI would have even more training data which would increase the odds of success by several orders of magnitude. The more data the better, which is why a target of 10,000 volunteers should be aimed for.
While multiple functionalities of machine learning would be present, deep learning would be used to find patterns in the data. For instance, there might be an observation that certain DNA variables correspond to a high immunity, while others correspond to a high mortality. At a minimum we would learn which segments of the human population are more susceptible and should be quarantined.
To decipher this data an Artificial Neural Network (ANN) would be located on the cloud, and sequenced human genomes from around the world would be uploaded. With time being of the essence, parallel computing will reduce the time required for the ANN to work its magic.
We could even take it one step further and use the output data sorted by the ANN,and feed it into a separate system called a Recurrent Neural Network (RNN). The RNN uses reinforcement learning to identify which gene selected by the initial ANN is most successful in a simulated environment. The reinforcement learning agent would gamify the entire process of creating a simulated setting, to test which DNA changes are more effective.
A simulated environment is like a virtual game environment, something many AI companies are well positioned to take advantage of based on their previous success in designing AI algorithms to win at esports. This includes companies such DeepMind and OpenAI.
These companies can use their underlying architecture optimized at mastering video games, to create a stimulated environment, test gene edits, and learn which edits lead to specific desired changes.
Once a gene is identified, another technology is used to make the edits.
Recently, the first ever study using CRISPR to edit DNA inside the human body was approved. This was to treat a rare type of genetic disorder that effects one of every 100,000 newborns. The condition can be caused by mutations in as many as 14 genes that play a role in the growth and operation of the retina. In this case, CRISPR sets out to carefully target DNA and to cause slight temporary damage to the DNA strand, causing the cell to repair itself. It is this restorative healing process which has the potential to restore eyesight.
While we are still waiting for results on if this treatment will work, the precedent of having CRISPR approved for trials in the human body is transformational. Potential disorders which can be treated include improving a body’s immune response to specific disease vectors.
Potentially, we can manipulate the body’s natural genetic resistance to a specific disease. The diseases that could potentially be targeted are diverse, but the community should be focusing on the treatment of the new global epidemic coronavirus. A threat that if unchecked could lead to a death sentence to a large percentage of our population.
While there are many potential options to achieving success, it will require that geneticists, epidemiologists, and machine learning specialists unify. A potential treatment option may be as described above, or may be revealed to be unimaginably different, the opportunity lies in the genome sequencing of a large segment of the population.
Deep learning is the best analysis tool that humans have ever created; we need to at a minimum attempt to use it to create a vaccine.
When we take into consideration what is currently at risk with this current epidemic, these three scientific communities need to come together to work on a cure.
What a Business AI Ethics Code Looks Like
By now, it’s safe to say that artificial intelligence (AI) has established itself in the mainstream, especially in the world of business. From customer service and marketing, to fraud detection and automation, this particular technology has helped streamline operations in recent years.
Unfortunately, our dependence on AI also means that it holds so much of our personal information – whether it’s our family history, the things we buy, places we go to, or even our favourite songs. Essentially, we’re giving technology free access to our lives. As AI continues to develop (and ask for even more data), it’s raising a lot of serious concerns.
For instance, when the South Wales Police rolled out its facial recognition systems, they were immediately questioned for being too “intrusive.” Of course, there’s the issue of safety and where all that data really goes.
On top of this, AI is also facing other hurdles, such as public distrust born from the fear of robots driving people into mass unemployment. Case in point, across the Atlantic, HP reports that 72% of Americans are worried about a future where robots and computers can do human jobs. While the latter may be a bit farfetched, especially since AI is still far from working or thinking like a human, you can’t deny that the rapidly growing AI industry must be controlled better than it is now. According to Stanford professor Emma Brunskill, if we truly want “AI [to value] its human users and [justify] the trust we place in autonomous systems,” then regulations have to be put in place. For that, businesses need to have an AI code of ethics.
AI Code of Ethics
The AI code of ethics isn’t meant for the AI itself, but for the people who develop and use said technology. Last year, the UK government published a report that aims to inform the public about its ethical use. All in all, the report can be summarised into five principles:
1. AI must be created and used for the benefit of all. AIs must be designed to help everyone and not just one faction. All involved parties – the government, businesses, and stockholders, for example – must be present during its creation to make sure that everyone’s interests are properly represented.
2. AI should not be used to diminish the data rights or privacy of individuals, families, and communities. AI can collect large amounts of consumer data that could prove dangerous if it gets into the wrong hands. Measures should be made to protect citizens and consumer privacy.
3. AI must operate within parameters understood by the human mind. To implement the necessary restrictions on AI’s programming, the machine has to be designed in a way that can be understood by humans still. This is also necessary to educate other people on the ins-and-outs of the machine.
4. Everybody has the right to be educated on the nuances of AI. Knowledge of AI should be available to everyone, even those outside of the business world. Fortunately, there are plenty of online resources available to aid anyone who wants to learn, from online videos to extensive courses. These topics can range from machine learning and Python, to R programming and Pandas – all of which are used in the development and implementation of AI. The commonality of such content proves just how accessible AI knowledge has become – and rightly so, given how ingrained it is in today’s society.
5. Humans must be able to flourish mentally, emotionally, and economically alongside AI. There is no doubt that AI has hugely influenced employment and our workforce. Whether it’s for the best or not is debatable.
In an employment survey published on Quartz, almost half of existing jobs are at high risk of being automated this coming decade. If AI wishes to remain ethical, businesses need to start creating new jobs to replace the ones threatened by AI.
New technologies such as AI are often a topic of concern, no matter what the benefits are. After all, it’s not enough to enjoy the convenience of technology without being critical of the possible repercussions. If all businesses implement these ethical principles, then the public might be more accepting of them. This additional support may be what tech companies need to push the development of AI even further.