“Big Data” is one of the commonly used buzz words of our current era, but what does it really mean?
Here’s a quick, simple definition of big data. Big data is data that is too large and complex to be handled by traditional data processing and storage methods. While that’s a quick definition you can use as a heuristic, it would be helpful to have a deeper, more complete understanding of big data. Let’s take a look at some of the concepts that underlie big data, like storage, structure, and processing.
How Big Is Big Data?
It isn’t as simple as saying “any data over the size ‘X ‘is big data”, the environment that the data is being handled in is an extremely important factor in determining what qualifies as big data. The size that data needs to be, in order to be considered big data, is dependant upon the context, or the task the data is being used in. Two datasets of vastly different sizes can be considered “big data” in different contexts.
To be more concrete, if you try to send a 200-megabyte file as an email attachment, you would not be able to do so. In this context, the 200-megabyte file could be considered big data. In contrast, copying a 200-megabyte file to another device within the same LAN may not take any time at all, and in that context, it wouldn’t be regarded as big data.
However, let’s assume that 15 terabytes worth of video need to be pre-processed for use in training computer vision applications. In this case, the video files take up so much space that even a powerful computer would take a long time to process them all, and so the processing would normally be distributed across multiple computers linked together in order to decrease processing time. These 15 terabytes of video data would definitely qualify as big data.
Types Of Big Data Structures
Big data comes in three different categories of structure: un-structured data, semi-structured, and structured data.
Unstructured data is data that possesses no definable structure, meaning the data is essentially just in one large pool. Examples of unstructured data would be a database full of unlabeled images.
Semi-structured data is data that doesn’t have a formal structure, but does exist within a loose structure. For example, email data might count as semi-structured data, because you could refer to the data contained in individual emails, but formal data patterns have not been established.
Structured data is data that has a formal structure, with data points categorized by different features. One example of structured data is an excel spreadsheet containing contact information like names, emails, phone numbers, and websites.
If you would like to read more about the differences in these data types, check the link here.
Metrics For Assessing Big Data
Big data can be analyzed in terms of three different metrics: volume, velocity, and variety.
Volume refers to the size of the data. The average size of datasets is often increasing. For example, the largest hard drive in 2006 was a 750 GB hard drive. In contrast, Facebook is thought to generate over 500 terabytes of data in a day and the largest consumer hard drive available today is a 16 terabyte hard drive. What quantifies as big data in one era may not be big data in another. More data is generated today because more and more of the objects surrounding us are equipped with sensors, cameras, microphones, and other data collection devices.
Velocity refers to how fast data is moving, or to put that another way, how much data is generated within a given period of time. Social media streams generate hundreds of thousands of posts and comments every minute, while your own email inbox will probably have much less activity. Big data streams are streams that often handle hundreds of thousands or millions of events in more or less real-time. Examples of these data streams are online gaming platforms and high-frequency stock trading algorithms.
Variety refers to the different types of data contained within the dataset. Data can be made up of many different formats, like audio, video, text, photos, or serial numbers. In general, traditional databases are formatted to handle one, or just a couple, types of data. To put that another way, traditional databases are structured to hold data that is fairly homogeneous and of a consistent, predictable structure. As applications become more diverse, full of different features, and used by more people, databases have had to evolve to store more types of data. Unstructured databases are ideal for holding big data, as they can hold multiple data types that aren’t related to each other.
Methods Of Handling Big Data
There are a number of different platforms and tools designed to facilitate the analysis of big data. Big data pools need to be analyzed to extract meaningful patterns from the data, a task that can prove quite challenging with traditional data analysis tools. In response to the need for tools to analyze large volumes of data, a variety of companies have created big data analysis tools. Big data analysis tools include systems like ZOHO Analytics, Cloudera, and Microsoft BI.
What Are Nanobots? Understanding Nanobot Structure, Operation, and Uses
As technology advances, things don’t always become bigger and better, objects also become smaller. In fact, nanotechnology is one of the fastest-growing technological fields, worth over 1 trillion USD, and it’s forecast to grow by approximately 17% over the next half-decade. Nanobots are a major part of the nanotechnology field, but what are they exactly and how do they operate? Let’s take a closer look at nanobots to understand how this transformative technology works and what it’s used for.
What Are Nanobots?
The field of nanotechnology is concerned with the research and development of technology approximately one to 100 nanometres in scale. Therefore, nanorobotics is focused on the creation of robots that are around this size. In practice, it’s difficult to engineer anything as small as one nanometer in scale and the term “nanorobotics” and “nanobot” is frequently applied to devices which are approximately 0.1 – 10 micrometers in size, which is still quite small.
It’s important to note that the term “nanorobot” is sometimes applied to devices which interact with objects at the nanoscale, manipulating nanoscale items. Therefore, even if the device itself is much larger, it may be considered a nanorobotic instrument. This article will focus on nanoscale robots themselves.
Much of the field of nanorobotics and nanobots is still in the theoretical phase, with research focused on solving the problems of construction at such a small scale. However, some prototype nanomachines and nanomotors have been designed and tested.
Most currently existing nanorobotic devices fall into one of four categories: switches, motors, shuttles, and cars.
Nanorobotic switches operate by being prompted to switch from an “off” state to an “on” state. Environmental factors are used to make the machine change shape, a process called conformational change. The environment is altered using processes like chemical reactions, UV light, and temperature, and the nanorobotic switches shift into different forms as a result, able to accomplish specific tasks.
Nanomotors are more complex than simple switches, and they utilize the energy created by the effects of the conformational change in order to move around and affect the molecules in the surrounding environment.
Shuttles are nanorobots that are capable of transporting chemicals like drugs to specific, targeted regions. The goal is to combine shuttles with nanorobot motors so that the shuttles are capable of a greater degree of movement through an environment.
Nanorobotic “cars” are the most advanced nanodevices at the moment, capable of moving independently with prompts from chemical or electromagnetic catalysts. The nanomotors that drive nanorobotic cars need to be controlled in order for the vehicle to be steered, and researchers are experimenting with various methods of nanorobotic control.
Nanorobotics researchers aim to synthesize these different components and technologies into nanomachines that can complete complex tasks, accomplished by swarms of nanobots working together.
How Are Nanobots Created?
The field of nanorobotics is at the crossroads of many disciplines and the creation of nanobots involves the creation of sensors, actuators and motors. Physical modeling must be done as well, and all of this must be done at nanoscale. As mentioned above, nanomanipulation devices are used to assemble these nano-scale parts and manipulate artificial or biological components, which includes the manipulation of cells and molecules.
Nanorobotics engineers must be able to solve a multitude of problems. They have to address issues regarding sensation, control power, communications, and interactions between both inorganic and organic materials.
The size of a nanobot is roughly comparable to biological cells, and because of this fact future nanobots could be employed in disciplines like medicine and environmental preservation/remediation. Most “nanobots” that exist today are just specific molecules which have been manipulated to accomplish certain tasks.
How Do Nanobots Operate?
Given the still heavily theoretical nature of nanobots, questions about how nanobots operate are answered with predictions rather than statements of fact. It’s likely that the first major uses for nanobots will be in the medical field, moving through the human body and accomplishing tasks like diagnosing diseases, monitoring vitals, and dispensing treatments. These nanobots will need to be able to navigate their way around the human body and move through tissues like blood vessels.
In terms of nanobot navigation, there are a variety of techniques that nanobot researchers and engineers are investigating. One method of navigation is the utilization of ultrasonic signals for detection and deployment. A nanobot could emit ultrasonic signals that could be traced to locate the position of the nanobots, and the robots could then be guided to specific areas with the use of a special tool that directs their motion. Magnetic Resonance Imaging (MRI) devices could also be employed to track the position of nanobots, and early experiments with MRIs have demonstrated that the technology can be used to detect and even maneuver nanobots. Other methods of detecting and maneuvring nanobots include the use of X-rays, microwaves and radio-waves. At the moment, our control of these waves at the nano-scale is fairly limited, so new methods of utilizing these waves would have to be invented.
The navigation and detection systems described above are external methods, relying on the use of tools to move the nanobots. With the addition of onboard sensors, the nanobots could be more autonomous. For instance, chemical sensors included onboard nanobots could allow the robot to scan the surrounding environment and follow certain chemical markers to a target region.
When it comes to powering the nanobots, there are also a variety of power solutions being explored by researchers. Solutions for powering nanobots include external power sources and onboard/internal power sources.
Internal power solutions include generators and capacitors. Generators onboard the nanobot could use the electrolytes found within the blood to produce energy, or nanobots could even be powered using the surrounding blood as a chemical catalyst that produces energy when combined with a chemical the nanobot carries with it. Capacitors operate similarly to batteries, storing electrical energy that could be used to propel the nanobot. Other options like tiny nuclear power sources have even been considered.
As far as external power sources go, incredibly small, thin wires could tether the nanobots to an outside power source. Such wires could be made out of miniature fiber optic cables, sending pulses of light down the wires and having the actual electricity be generated within the nanobot.
Other external power solutions include magnetic fields or ultrasonic signals. Nanobots could employ something called a piezoelectric membrane, which is capable of collecting ultrasonic waves and transforming them into electrical power. Magnetic fields can be used to catalyze electrical currents within a closed conducting loop contained onboard the nanobot. As a bonus, the magnetic field could also be used to control the direction of the nanobot.
Addressing the problem of nanobot locomotion requires some inventive solutions. Nanobots that aren’t tethered, or aren’t just free-floating in their environment, need to have some method of moving to their target locations. The propulsion system will need to be powerful and stable, able to propel the nanobot against currents in its surrounding environment, like the flow of the blood. Propulsion solutions under investigation are often inspired by the natural world, with researchers looking at how microscope organisms move through their environment. For instance, microorganisms often use long, whip-like tails called flagella to propel themselves, or they use a number of tiny, hair-like limbs dubbed cilia.
Researchers are also experimenting with giving robots small arm-like appendages that could allow the robot to swim, grip, and crawl. Currently, these appendages are controlled via magnetic fields outside the body, as the magnetic force prompts the robot’s arms to vibrate. An added benefit to this method of locomotion is that the energy for it comes from an outside source. This technology would need to be made even smaller to make it viable for true nanobots.
There are other, more inventive, propulsion strategies also under investigation. For instance, some researchers have proposed using capacitors to engineer an electromagnetic pump that would pull conductive fluids in and shoot it out like a jet, propelling the nanobot forward.
Regardless of the eventual application of nanobots, they must solve the problems described above, handling navigation, locomotion, and power.
What Are Nanobots Used For?
As mentioned, the first uses for nanobots will likely be in the medical field. Nanobots could be used to monitor for damage to the body, and potentially even facilitate the repair of this damage. Future nanobots could deliver medicine directly to the cells that need them. Currently, medicines are delivered orally or intravenously and they spread throughout the body instead of hitting just the target regions, causing side effects. Nanobots equipped with sensors could easily be used to monitor for changes in regions of cells, reporting changes at the first sign of damage or malfunction.
We are still a long way away from these hypothetical applications, but progress is being made all the time. As an example, in 2017 scientists created nanobots that targeted cancer cells and attacked them with a miniaturized drill, killing them. This year, a group of researchers from ITMO University designed a nanobot composed of DNA fragments, capable of destroying pathogenic RNA strands. DNA-based nanobots are also currently capable of transporting molecular cargo, The nanobot is made of three different DNA sections, maneuvering with a DNA “leg” and carrying specific molecules with the use of an “arm”.
Beyond medical applications, research is being done regarding the use of nanobots for the purposes of environmental cleanup and remediation. Nanobots could potentially be used to remove toxic heavy metals and plastics from bodies of water. The nanobots could carry compounds that render toxic substances inert when combined together, or they could be used to degrade plastic waste through similar processes. Research is also being done on the use of nanobots to facilitate the production of extremely small computer chips and processors, essentially using nanobots to produce microscale computer circuits.
What Are Deepfakes?
As deepfakes become easier to make and more prolific, more attention is paid to them. Deepfakes have become the focal point of discussions involving AI ethics, misinformation, openness of information and the internet, and regulation. It pays to be informed regarding deepfakes, and to have an intuitive understanding of what deepfakes are. This article will clarify the definition of a deepfake, examine their use cases, discuss how deepfakes can be detected, and examine the implications of deepfakes for society.
What Is A Deepfakes?
Before going on to discuss deepfakes further, it would be helpful to take some time and clarify what “deepfakes” actually are. There is a substantial amount of confusion regarding the term Deepfake, and often the term is misapplied to any falsified media, regardless of whether or not it is a genuine deepfake. In order to qualify as a Deepfake, the faked media in question must be generated with a machine-learning system, specifically a deep neural network.
The key ingredient of deepfakes is machine learning. Machine learning has made it possible for computers to automatically generate video and audio relatively quickly and easily. Deep neural networks are trained on footage of a real person in order for the network to learn how people look and move under the target environmental conditions. The trained network is then used on images of another individual and augmented with additional computer graphics techniques in order to combine the new person with the original footage. An encoder algorithm is used to determine the similarities between the original face and the target face. Once the common features of the faces have been isolated, a second AI algorithm called a decoder is used. The decoder examines the encoded (compressed) images and reconstructs them based off on the features in the original images. Two decoders are used, one on the original subject’s face and the second on the target person’s face. In order for the swap to be made, the decoder trained on images of person X is fed images of person Y. The result is that person Y’s face is reconstruction over Person X’s facial expressions and orientation.
Currently, it still takes a fair amount of time for a deepfake to be made. The creator of the fake has to spend a long time manually adjusting parameters of the model, as suboptimal parameters will lead to noticeable imperfections and image glitches that give away the fake’s true nature.
Although it’s frequently assumed that most deepfakes are made with a type of neural network called a generative adversarial network (GAN), many (perhaps most) deepfakes created these days do not rely on GANs. While GANs did play a prominent role in the creation of early deepfakes, most deepfake videos are created through alternative methods, according to Siwei Lyu from SUNY Buffalo.
It takes a disproportionately large amount of training data in order to train a GAN, and GANs often take much longer to render an image compared to other image generation techniques. GANs are also better for generating static images than video, as GANs have difficulties maintaining consistencies from frame to frame. It’s much more common to use an encoder and multiple decoders to create deepfakes.
What Are Deepfakes Used For?
Many of the deepfakes found online are pornographic in nature. According to research done by Deeptrace, an AI firm, out of a sample of approximately 15,000 deepfake videos taken in September of 2019, approximately 95% of them were pornographic in nature. A troubling implication of this fact is that as the technology becomes easier to use, incidents of fake revenge porn could rise.
However, not all deep fakes are pornographic in nature. There are more legitimate uses for deepfake technology. Audio deepfake technology could help people broadcast their regular voices after they are damaged or lost due to illness or injury. Deepfakes can also be used for hiding the faces of people who are in sensitive, potentially dangerous situations, while still allowing their lips and expressions to be read. Deepfake technology can potentially be used to improve the dubbing on foreign-language films, aid in the repair of old and damaged media, and even create new styles of art.
While most people think of fake videos when they hear the term “deepfake”, fake videos are by no means the only kind of fake media produced with deepfake technology. Deepfake technology is used to create photo and audio fakes as well. As previously mentioned, GANs are frequently used to generate fake images. It’s thought that there have been many cases of fake LinkedIn and Facebook profiles that have profile images generated with deepfake algorithms.
It’s possible to create audio deepfakes as well. Deep neural networks are trained to produce voice clones/voice skins of different people, including celebrities and politicians. One famous example of an audio Deepfake is when the AI company Dessa made use of an AI model, supported by non-AI algorithms, to recreate the voice of the podcast host Joe Rogan.
How To Spot Deepfakes
As deepfakes become more and more sophisticated, distinguishing them from genuine media will become tougher and tougher. Currently, there are a few telltale signs people can look for to ascertain if a video is potentially a deepfake, like poor lip-syncing, unnatural movement, flickering around the edge of the face, and warping of fine details like hair, teeth, or reflections. Other potential signs of a deepfake include lower-quality parts of the same video, and irregular blinking of the eyes.
While these signs may help one spot a deepfake at the moment, as deepfake technology improves the only option for reliable deepfake detection might be other types of AI trained to distinguish fakes from real media.
Artificial intelligence companies, including many of the large tech companies, are researching methods of detecting deepfakes. Last December, a deepfake detection challenge was started, supported by three tech giants: Amazon, Facebook, and Microsoft. Research teams from around the world worked on methods of detecting deepfakes, competing to develop the best detection methods. Other groups of researchers, like a group of combined researchers from Google and Jigsaw, are working on a type of “face forensics” that can detect videos that have been altered, making their datasets open source and encouraging others to develop deepfake detection methods. The aforementioned Dessa has worked on refining deepfake detection techniques, trying to ensure that the detection models work on deepfake videos found in the wild (out on the internet) rather than just on pre-composed training and testing datasets, like the open-source dataset Google provided.
There are also other strategies that are being investigated to deal with the proliferation of deepfakes. For instance, checking videos for concordance with other sources of information is one strategy. Searches can be done for video of events potentially taken from other angles, or background details of the video (like weather patterns and locations) can be checked for incongruities. Beyond this, a Blockchain online ledger system could register videos when they are initially created, holding their original audio and images so that derivative videos can always be checked for manipulation.
Ultimately, it’s important that reliable methods of detecting deepfakes are created and that these detection methods keep up with the newest advances in deepfake technology. While it is hard to know exactly what the effects of deepfakes will be, if there are not reliable methods of detecting deepfakes (and other forms of fake media), misinformation could potentially run rampant and degrade people’s trust in society and institutions.
Implications of Deepfakes
What are the dangers of allowing deep fake to proliferate unchecked?
One of the biggest problems that deepfakes create currently is nonconsensual pornography, engineered by combining people’s faces with pornographic videos and images. AI ethicists are worried that deepfakes will see more use in the creation of fake revenge porn. Beyond this, deepfakes could be used to bully and damage the reputation of just about anyone, as they could be used to place people into controversial and compromising scenarios.
Companies and cybersecurity specialists have expressed concern about the use of deepfakes to facilitate scams, fraud, and extortion. Allegedly, deepfake audio has been used to convince employees of a company to transfer money to scammers
It’s possible that deepfakes could have harmful effects even beyond those listed above. Deepfakes could potentially erode people’s trust in media generally, and make it difficult for people to distinguish between real news and fake news. If many videos on the web are fake, it becomes easier for governments, companies, and other entities to cast doubt on legitimate controversies and unethical practices.
When it comes to governments, deepfakes may even pose threats to the operation of democracy. Democracy requires that citizens are able to make informed decisions about politicians based on reliable information. Misinformation undermines democratic processes. For example, the president of Gabon, Ali Bongo, appeared in a video attempting to reassure the Gabon citizenry. The president was assumed to be unwell for long a long period of time, and his sudden appearance in a likely fake video kicked off an attempted coup. President Donald Trump claimed that an audio recording of him bragging about grabbing women by the genitals was fake, despite also describing it as “locker room talk”. Prince Andrew also claimed that an image provided by Emily Maitilis’ attorney was fake, though the attorney insisted on its authenticity.
Ultimately, while there are legitimate uses for deepfake technology, there are many potential harms that can arise from the misuse of that technology. For that reason, it’s extremely important that methods to determine the authenticity of media be created and maintained.
What is Federated Learning?
The traditional method of training AI models involves setting up servers where models are trained on data, often through the use of a cloud-based computing platform. However, over the past few years an alternative form of model creation has arisen, called federated learning. Federated learning brings machine learning models to the data source, rather than bringing the data to the model. Federated learning links together multiple computational devices into a decentralized system that allows the individual devices that collect data to assist in training the model.
In a federated learning system, the various devices that are part of the learning network each have a copy of the model on the device. The different devices/clients train their own copy of the model using the client’s local data, and then the parameters/weights from the individual models are sent to a master device, or server, that aggregates the parameters and updates the global model. This training process can then be repeated until a desired level of accuracy is attained. In short, the idea behind federated learning is that none of the training data is ever transmitted between devices or between parties, only the updates related to the model are.
Federated learning can be broken down into three different steps or phases. Federated learning typically starts with a generic model that acts as a baseline and is trained on a central server. In the first step, this generic model is sent out to the application’s clients. These local copies are then trained on data generated by the client systems, learning and improving their performance.
In the second step, the clients all send their learned model parameters to the central server. This happens periodically, on a set schedule.
In the third step, the server aggregates the learned parameters when it receives them. After the parameters are aggregated, the central model is updated and shared once more with the clients. The entire process then repeats.
The benefit of having a copy of the model on the various devices is that network latencies are reduced or eliminated. The costs associated with sharing data with the server is eliminated as well. Other benefits of federate learning methods include the fact that federated learning models are privacy preserved, and model responses are personalized for the user of the device.
Examples of federated learning models include recommendation engines, fraud detection models, and medical models. Media recommendation engines, of the type used by Netflix or Amazon, could be trained on data gathered from thousands of users. The client devices would train their own separate models and the central model would learn to make better predictions, even though the individual data points would be unique to the different users. Similarly, fraud detection models used by banks can be trained on patterns of activity from many different devices, and a handful of different banks could collaborate to train a common model. In terms of a medical federated learning model, multiple hospitals could team up to train a common model that could recognize potential tumors through medical scans.
Types of Federated Learning
Federated learning schemas typically fall into one of two different classes: multi-party systems and single-party systems. Single-party federated learning systems are called “single-party” because only a single entity is responsible for overseeing the capture and flow of data across all of the client devices in the learning network. The models that exist on the client devices are trained on data with the same structure, though the data points are typically unique to the various users and devices.
In contrast to single-party systems, multi-party systems are managed by two or more entities. These entities cooperate to train a shared model by utilizing the various devices and datasets they have access to. The parameters and data structures are typically similar across the devices belonging to the multiple entities, but they don’t have to be exactly the same. Instead, pre-processing is done to standardize the inputs of the model. A neutral entity might be employed to aggregate the weights established by the devices unique to the different entities.
Common Technologies and Frameworks for Federated Learning
Popular frameworks used for federated learning include Tensorflow Federated, Federated AI Technology Enabler (FATE), and PySyft. PySyft is an open-source federated learning library based on the deep learning library PyTorch. PySyft is intended to ensure private, secure deep learning across servers and agents using encrypted computation. Meanwhile, Tensorflow Federated is another open-source framework built on Google’s Tensorflow platform. In addition to enabling users to create their own algorithms, Tensorflow Federated allows users to simulate a number of included federated learning algorithms on their own models and data. Finally, FATE is also open-source framework designed by Webank AI, and it’s intended to provide the Federated AI ecosystem with a secure computing framework.
Federated Learning Challenges
As federated learning is still fairly nascent, a number of challenges still have to be negotiated in order for it to achieve its full potential. The training capabilities of edge devices, data labeling and standardization, and model convergence are potential roadblocks for federated learning approaches.
The computational abilities of the edge devices, when it comes to local training, need to be considered when designing federated learning approaches. While most smartphones, tablets, and other IoT compatible devices are capable of training machine learning models, this typically hampers the performance of the device. Compromises will have to be made between model accuracy and device performance.
Labeling and standardizing data is another challenge that federated learning systems must overcome. Supervised learning models require training data that is clearly and consistently labeled, which can be difficult to do across the many client devices that are part of the system. For this reason, it’s important to develop model data pipelines that automatically apply labels in a standardized way based on events and user actions.
Model convergence time is another challenge for federated learning, as federated learning models typically take longer to converge than locally trained models. The number of devices involved in the training adds an element of unpredictability to the model training, as connection issues, irregular updates, and even different application use times can contribute to increased convergence time and decreased reliability. For this reason, federated learning solutions are typically most useful when they provide meaningful advantages over centrally training a model, such as instances where datasets are extremely large and distributed.
- Phil Duffy, VP of Product, Program & UX Design at Brain Corp – Interview Series
- Adi Singh, Product Manager in Robotics at Canonical – Interview Series
- Clearview AI Halts Facial Recognition Services in Canada Amid Investigation
- Mike Lahiff, CEO at ZeroEyes – Interview Series
- U.S. Sees First Case of Wrongful Arrest Due to Bad Algorithm