When it comes to keeping up with emerging cybersecurity trends, the process of staying on top of any recent developments can get quite tedious since there’s a lot of news to keep up with. These days, however, the situation has changed dramatically, since the cybersecurity realms seem to be revolving around two words- deep learning.
Although we were initially taken aback by the massive coverage that deep learning was receiving, it quickly became apparent that the buzz generated by deep learning was well-earned. In a fashion similar to the human brain, deep learning enables an AI model to achieve highly accurate results, by performing tasks directly from the text, images, and audio cues.
Up till this point, it was widely believed that deep learning relies on a huge set of data, quite similar to the magnitude of data housed by Silicon Valley giants Google and Facebook to meet the aim of solving the most complicated problems within an organization. Contrary to popular belief, however, enterprises can harness the power of deep learning, even with access to a limited data pool.
In an attempt to aid our readers with the necessary knowledge to equip their organization with deep learning, we’ve compiled an article that dives deep (no pun intended) into some of the ways in which enterprises can utilize the benefits of deep learning in spite of having access to limited, or ‘small’ data.
But before we can get into the meat of the article, we’d like to make a small, but highly essential suggestion- start simple. However, before you start formulating neural networks complex enough to feature in a sci-fi movie, start by experimenting with a few simple and conventional models, (e.g. random forest) to get the hang of the software.
With that out of the way, let’s get straight into some of the ways in which enterprises can amalgamate the deep learning technology while having access to limited data.
#1- Fine-lining the baseline model:
As we’ve already mentioned above, the first step that enterprises need to take after they’ve formulated a simple baseline deep learning model is to fine-tune them for the particular problem at hand.
However, fine-tuning a baseline model sounds much difficult on paper, then it actually is. The fundamental idea behind fine-tuning a large data set to cater to the specific needs of an enterprise is simple- you take a large data, that bears some resemblance to the domain you function in, and then fine-tune the details of the original data set, with your limited data.
As far as obtaining the large data set is concerned, enterprise owners can rely on ImageNet, which subsequently also provides an easy to fix to any problems of image classification as well. The dataset hosted by ImageNet allows organizations access to millions of images, which are divided across multiple classes of images, which can be useful to enterprises hailing from a wide variety of domains, including, but certainly not limited to images of animals, etc.
If the process of fine-tuning a pre-trained model to suit the specific needs of your organization still seems like too much work for you, we’d recommend getting help from the internet, since a simple Google search will provide you with hundreds of tutorials on how to fine-tune a dataset.
#2- Collect more data:
Although the second point on our list might seem redundant to some of our more cynical readers, the fact of the matter remains- when it comes to deep learning, the larger your data set is, the more likely you are to achieve more accurate results.
Although the very essence of this article lies in providing enterprises with a limited data set, we’ve often had the displeasure of encountering too many “higher-ups,” who treat investing in the collection of data equivalent to committing a cardinal sin.
It is all too often that businesses tend to overlook the benefits offered by deep learning, simply because they are reluctant to invest time and effort in the gathering of data. If your enterprise is unsure about the amount of data that needs to be collected, we’d suggest to plot learning curves, as the additional data is integrated into the model, and observe the change in model performance.
Contrary to the popular belief held by most CSO’s and CISO’s, sometimes the best way to solve problems is through the collection of more relevant, data. The role of CSO and CISO is extremely important in this case because there is always a threat of cyber-attacks. It is found that in 2019, the total global spending on cybersecurity takes up to $103.1 billion, and the number continues to rise. To put this into perspective, let’s consider a simple example- imagine that you were trying to classify rare diamonds, but have access to a very limited data set. As the most obvious solution to the problem dictates, instead of having a field day with the baseline model, just collect more data!
#3- Data Augmentation:
Although the first two points we’ve discussed above are both highly efficient in providing an easy solution to most problems surrounding the implementation of deep learning into enterprises with a small data set, they rely heavily on a certain level of luck to get the job done.
If you’re unable to have any success with fine-tuning a pre-existing data set either, we’d recommend trying data augmentation. The way that data augmentation is simple. Through the process of data augmentation, the input data set is altered, or augmented, in such a way that it gives a new output, without actually changing the label value.
To put the idea of data augmentation into perspective for our readers, let’s consider a picture of a dog. When rotated, the viewer of the image will still be able to tell that it’s an image of a dog. This is exactly what good data augmentation hopes to achieve, as compared to a rotated image of a road, which changes the angle of elevation and leaves plenty of space for the deep learning algorithm to come to an incorrect conclusion, and defeats the purpose of implementing deep-learning in the first place.
When it comes to solving problems related to image classification, data augmentation serves as a key player in the field and hosts a variety of data augmentation techniques that help the deep learning model to gain an in-depth understanding of the different classifications of images.
Moreover, when it comes to augmenting data- the possibilities are virtually endless. Enterprises can implement data augmentation in a variety of ways, which include NLP, and experimentation of GANs, which enables the algorithm to generate new data.
#4- Implementing an ensemble effect:
The technology behind deep learning dictates that the network is built upon multiple layers. However, contrary to popular belief maintained by many, rather than viewing each layer as an “ever-increasing” hierarchy of features, the final layer serves the purpose of offering an ensemble mechanism.
The belief that enterprises with access to a limited, or smaller data set should opt to build their networks deep was also shared in a NIPs paper, which mirrored the belief we’ve expressed above. Enterprises with small data can easily manipulate the ensemble effect to their advantage, simply by building their deep learning networks deep, through fine-tuning or some other alternative.
#5- Incorporating autoencoders:
Although the fifth point we’ve taken into consideration for has received only a relative level of success- we’re still on board with the use of autoencoders in order to pre-train a network and initialize the network properly.
One of the biggest reasons apart from cyber-attacks as to why enterprises fail to get over the initial hurdles of integrating deep learning is because of bad initialization, and it’s many pitfalls. Unsupervised pre-training often leads to poor, or incorrect execution of the deep learning technology, which is where autoencoders can shine.
The fundamental notion behind a neural network dictates the creation of a neural network that predicts the nature of the dataset being input. If you are unsure of how to use an autoencoder, there are several tutorials online that give clear cut instructions.
At the end of the article, we’d like to reimburse what we’ve said throughout the article, with one addition- incorporating domain-specific knowledge into the learning process! Not only does the incorporation of valuable insight to speed up the learning process, but it also allows for the deep learning technology to produce better, and more accurate results.
What a Business AI Ethics Code Looks Like
By now, it’s safe to say that artificial intelligence (AI) has established itself in the mainstream, especially in the world of business. From customer service and marketing, to fraud detection and automation, this particular technology has helped streamline operations in recent years.
Unfortunately, our dependence on AI also means that it holds so much of our personal information – whether it’s our family history, the things we buy, places we go to, or even our favourite songs. Essentially, we’re giving technology free access to our lives. As AI continues to develop (and ask for even more data), it’s raising a lot of serious concerns.
For instance, when the South Wales Police rolled out its facial recognition systems, they were immediately questioned for being too “intrusive.” Of course, there’s the issue of safety and where all that data really goes.
On top of this, AI is also facing other hurdles, such as public distrust born from the fear of robots driving people into mass unemployment. Case in point, across the Atlantic, HP reports that 72% of Americans are worried about a future where robots and computers can do human jobs. While the latter may be a bit farfetched, especially since AI is still far from working or thinking like a human, you can’t deny that the rapidly growing AI industry must be controlled better than it is now. According to Stanford professor Emma Brunskill, if we truly want “AI [to value] its human users and [justify] the trust we place in autonomous systems,” then regulations have to be put in place. For that, businesses need to have an AI code of ethics.
AI Code of Ethics
The AI code of ethics isn’t meant for the AI itself, but for the people who develop and use said technology. Last year, the UK government published a report that aims to inform the public about its ethical use. All in all, the report can be summarised into five principles:
1. AI must be created and used for the benefit of all. AIs must be designed to help everyone and not just one faction. All involved parties – the government, businesses, and stockholders, for example – must be present during its creation to make sure that everyone’s interests are properly represented.
2. AI should not be used to diminish the data rights or privacy of individuals, families, and communities. AI can collect large amounts of consumer data that could prove dangerous if it gets into the wrong hands. Measures should be made to protect citizens and consumer privacy.
3. AI must operate within parameters understood by the human mind. To implement the necessary restrictions on AI’s programming, the machine has to be designed in a way that can be understood by humans still. This is also necessary to educate other people on the ins-and-outs of the machine.
4. Everybody has the right to be educated on the nuances of AI. Knowledge of AI should be available to everyone, even those outside of the business world. Fortunately, there are plenty of online resources available to aid anyone who wants to learn, from online videos to extensive courses. These topics can range from machine learning and Python, to R programming and Pandas – all of which are used in the development and implementation of AI. The commonality of such content proves just how accessible AI knowledge has become – and rightly so, given how ingrained it is in today’s society.
5. Humans must be able to flourish mentally, emotionally, and economically alongside AI. There is no doubt that AI has hugely influenced employment and our workforce. Whether it’s for the best or not is debatable.
In an employment survey published on Quartz, almost half of existing jobs are at high risk of being automated this coming decade. If AI wishes to remain ethical, businesses need to start creating new jobs to replace the ones threatened by AI.
New technologies such as AI are often a topic of concern, no matter what the benefits are. After all, it’s not enough to enjoy the convenience of technology without being critical of the possible repercussions. If all businesses implement these ethical principles, then the public might be more accepting of them. This additional support may be what tech companies need to push the development of AI even further.
Expert Predictions For AI’s Trajectory In 2020
VentureBeat recently interviewed five of the most intelligent, expert minds in the AI field and asked them to make their predictions for where AI is heading over the course of the year to come. The individuals interviewed for their predictions were:
- Soumith Chintala, creator of PyTorch.
- Celeste Kidd, AI professor at the University of California.
- Jeff Dean, chief of Google AI.
- Anima Anandkumar, machine learning research director at Nvidia.
- Dario Gil, IBM Research director.
Chintala, the creator of Pytorch, which is arguably the most popular machine learning framework at the moment, predicted that 2020 will see a greater need for neural network hardware accelerators and methods of boosting model training speeds. Chintala expected that the next couple of years will see an increased focus on how to use GPUs optimally and how compiling can be done automatically for new hardware. Beyond this, Chintala expected that the AI community will begin pursuing other methods of quantifying AI performance more aggressively, placing less importance on pure accuracy. Factors for consideration include things like the amount of energy needed to train a model, how AI can be used to build the sort of society we want, and how the output of a network can be intuitively explained to human operators.
Celeste Kidd has spent much of her recent career advocating for more responsibility on the part of designers of algorithms, tech platforms, and content recommendation systems. Kidd has often argued that systems that are designed to maximize engagement can end up having serious impacts regarding how people create their opinions and beliefs. More and more attention is being paid to the ethical use of AI algorithms and systems, and Kidd predicted that in 2020 there will be an increased awareness of how tech tools and platforms are influencing people’s lives and decisions, as well as a rejection of the idea that tech tools can be genuinely neutral in design.
“We really need to, as a society and especially as the people that are working on these tools, directly appreciate the responsibility that that comes with,” Kidd said.
Jeff Dean, the current head of Google AI, predicted that in 2020 there will be progress in multimodal learning and multitask learning. Multimodel learning is when AI is trained with multiple types of media at one time, while multitask learning endeavors to allow AI to train on multiple tasks at one time. Dean also expected further progress to be made regarding natural language processing models based on Transformer, such as Google’s BERT algorithm and the other models that topped the GLUE leaderboards. Dean also mentioned he would like to see less desire to create the most-advanced state-of-the-art performance models and more desire to create models that are more robust and flexible.
Anandkumar expected that the AI community will have to grapple with many challenges in 2020, especially the need for more diverse datasets and the need to ensure people’s privacy when training on data. Anandkumar explained that while face recognition often gets the most attention, there are many areas where people’s privacy can be violated and that these issues may come to the forefront of discussion during 2020.
Anandkumar also expected that further advancements will be made regarding Transformer based natural language processing models.
“We are still not at the stage of dialogue generation that’s interactive, that can keep track and have natural conversations. So I think there will be more serious attempts made in 2020 in that direction,” she said.
Finally, Anandkumar expected that the coming year will see more development of the iterative algorithm and self-supervision. These training methods allow AI systems to self-train in some respects, and can potentially help create models that can improve by self-training on data that are unlabeled.
Gil predicted that in 2020 there will be more progress towards creating AI in a more computationally efficient manner, as the way deep neural networks are currently trained is inefficient in many ways. Because of this, Gil expected that this year will see progress in terms of creating reduced-precision architectures and generally training more efficiently. Much like some of the other experts who were interviewed, Gil predicted that in 2020 researchers will start to focus more on metrics aside from accuracy. Gil expressed an interest in neural symbolic AI, as IBM is examining ways to create probabilistic programming models using neural symbolic approaches. Finally, Gil emphasized the importance of making AI more accessible to those interested in machine learning and getting rid of the perception that only geniuses can work with AI and do data science.
“If we leave it as some mythical realm, this field of AI, that’s only accessible to the select PhDs that work on this, it doesn’t really contribute to its adoption,” Gil said.
- Researchers Improve Robotic Arm Used in Surgery
- DeepMind and Google Brain Aim Create Methods to Improve Efficiency of Reinforcement Learning
- Deep Learning Used to Find Disease-Related Genes
- AI “Maths Robot” Helps Manage Microclimates and Increase Berry Yield Predictions
- Computer Scientists Tackle Bias in AI