Hollywood has been embracing digital technology and computational algorithms in order to movies for a while now, using CGI to de-age actors and enhance shots in other ways. Just recently, one Hollywood company announced its intention to use AI to analyze movie data and assist in making a decision regarding greenlighting projects. As reported by The Hollywood Reporter, the AI firm will be providing Warner Bros. a program intended to simplify aspects of distribution and give projections regarding pricing and possible profit.
The system developed for Warner Bros. will utilize big data to guide decision-making during the greenlight phase of a project. The system can reportedly return analyses regarding star power for a given region and even predict how much money a film is likely to make in theaters and through other distribution methods. Cinelytic has reportedly been engineering and beta-testing their predictive platform for over three years, and in addition to Warner Bros, several other companies, such as Ingenious Media and Productivity Media, have partnered with the company.
The AI platform is predicted to be especially useful when it comes to film festivals, where companies must make bids on films after only a few hours of deliberation.
Tobias Queisser, the founder of Cinelytic, stated that the value of the platform is that it will be able to quickly make the types of calculations that would take human analysts much longer to complete. Queisser also acknowledges that while the idea of giving AI influenced over what projects get produced can be unnerving, the AI itself won’t be making any decisions.
“The system can calculate in seconds what used to take days to assess by a human when it comes to general film package evaluation or a star’s worth,” says Queisser. “Artificial intelligence sounds scary. But right now, an AI cannot make any creative decisions,” says Queisser. “What it is good at is crunching numbers and breaking down huge data sets and showing patterns that would not be visible to humans. But for creative decision-making, you still need experience and gut instinct.”
Despite Queisser’s assurances that humans will still be in charge of any important decisions, some people are concerned about how the AI will be used. For instance, Popular Mechanics noted that the entire Marvel film franchise was based on the willingness of executives to take a chance on Iron Man and Robert Downey Jr., who was considered “box office poison” at one time. The fear is that using AI algorithms to minimize risk could lead to situations where original and/or high-quality films are passed over. To be sure, AI tools can potentially extend our own biases if there aren’t systems in place to control them.
Of course, one could make the argument that the technology behind Cinelytic’s analysis tool could be used to give more deserving projects a chance, instead of projects that are likely to fail. As QZ notes, Cinelytic was tested last year when it predicted that the Hellboy film would end up being a box office bomb, and it was proven correct. The film had a $50 million dollar budget and it made only about $21.9 million at the box office after Cinelytic’s tool predicted that it would make around $23.2 million. A correct prediction like this could mean that executives could take that money and invest it in projects that have more potential, making those resources available to other films. It could potentially even make choosing new investments in new IPs less scary and uncertain for those greenlighting projects.
Looking beyond Cinelytic, if AI algorithms are ever used to recommend films, the algorithms could also be used to control for human biases in decision making. Depending on what features the AI selects for, it could be instructed to recommend stories about underrepresented minorities more often, reducing some of the disparity in representation often seen in Hollywood films.
Ultimately, the AI device tool developed Cinelytic is a tool, and much like any tool it can be used properly or misused. Regardless, it seems likely that automating repetitive and time-consuming calculations is something the movie industry is only going to continue to invest in.
What is Big Data?
“Big Data” is one of the commonly used buzz words of our current era, but what does it really mean?
Here’s a quick, simple definition of big data. Big data is data that is too large and complex to be handled by traditional data processing and storage methods. While that’s a quick definition you can use as a heuristic, it would be helpful to have a deeper, more complete understanding of big data. Let’s take a look at some of the concepts that underlie big data, like storage, structure, and processing.
How Big Is Big Data?
It isn’t as simple as saying “any data over the size ‘X ‘is big data”, the environment that the data is being handled in is an extremely important factor in determining what qualifies as big data. The size that data needs to be, in order to be considered big data, is dependant upon the context, or the task the data is being used in. Two datasets of vastly different sizes can be considered “big data” in different contexts.
To be more concrete, if you try to send a 200-megabyte file as an email attachment, you would not be able to do so. In this context, the 200-megabyte file could be considered big data. In contrast, copying a 200-megabyte file to another device within the same LAN may not take any time at all, and in that context, it wouldn’t be regarded as big data.
However, let’s assume that 15 terabytes worth of video need to be pre-processed for use in training computer vision applications. In this case, the video files take up so much space that even a powerful computer would take a long time to process them all, and so the processing would normally be distributed across multiple computers linked together in order to decrease processing time. These 15 terabytes of video data would definitely qualify as big data.
Types Of Big Data Structures
Big data comes in three different categories of structure: un-structured data, semi-structured, and structured data.
Unstructured data is data that possesses no definable structure, meaning the data is essentially just in one large pool. Examples of unstructured data would be a database full of unlabeled images.
Semi-structured data is data that doesn’t have a formal structure, but does exist within a loose structure. For example, email data migtht count as semi-structured data, because you could refer to the data contained in individual emails, but formal data patterns have not been established.
Structured data is data that has a formal structure, with data points categorized by different features. One example of structured data is an excel spreadsheet containing contact information like names, emails, phone numbers, and websites.
If you would like to read more about the differences in these data types, check the link here.
Metrics For Assessing Big Data
Big data can be analyzed in terms of three different metrics: volume, velocity, and variety.
Volume refers to the size of the data. The average size of datasets is often increasing. For example, the largest hard drive in 2006 was a 750 GB hard drive. In contrast, Facebook is thought to generate over 500 terabytes of data in a day and the largest consumer hard drive available today is a 16 terabyte hard drive. What quantifies as big data in one era may not be big data in another. More data is generated today because more and more of the objects surrounding us are equipped with sensors, cameras, microphones, and other data collection devices.
Velocity refers to how fast data is moving, or to put that another way, how much data is generated within a given period of time. Social media streams generate hundreds of thousands of posts and comments every minute, while your own email inbox will probably have much less activity. Big data streams are streams that often handle hundreds of thousands or millions of events in more or less real-time. Examples of these data streams are online gaming platforms and high-frequency stock trading algorithms.
Variety refers to the different types of data contained within the dataset. Data can be made up of many different formats, like audio, video, text, photos, or serial numbers. In general, traditional databases are formatted to handle one, or just a couple, types of data. To put that another way, traditional databases are structured to hold data that is fairly homogenous and of a consistent, predictable structure. As applications become more diverse, full of different features, and used by more people, databases have had to evolve to store more types of data. Unstructured databases are ideal for holding big data, as they can hold multiple data types that aren’t related to each other.
Methods Of Handling Big Data
There are a number of different platforms and tools designed to facilitate the analysis of big data. Big data pools need to be analyzed to extract meaningful patterns from the data, a task that can prove quite challenging with traditional data analysis tools. In response to the need for tools to analyze large volumes of data, a variety of companies have created big data analysis tools. Big data analysis tools include systems like ZOHO Analytics, Cloudera, and Microsoft BI.
Jerry Xu, Co-Founder & CEO of Datatron – Interview Series
Jerry has extensive experience in machine learning, storage systems, online service, distributed systems, virtualization, and OS kernel. He has worked on high performance and large-scale systems at companies such as: Lyft, Box, Twitter, Zynga, and Microsoft. He has also authored the open-source project Lib Crunch and is a three-time Microsoft Gold Star Award winner. Jerry completed his master’s degree in computer science at Shanghai University. His most recent startup is Datatron.
Datatron began in 2016 after you left Lyft. How did you initially conceive of the Datatron business concept?
When we worked at Lyft, we noticed that data scientist usually comes from diverse background like Math, Physics, Bio-Engineering etc. It is often very hard for them to get the engineering part of how their models work although they have good intuition on the model and math. That motivated us to start Datatron. We are not trying to help data scientist to find the best algorithm. We only come into picture after the algorithm is decided and to make the model deployment, monitoring and management more efficient.
Datatron was selected by 500 Startups to be included in the 18th cohort of accelerator companies. How did this residency personally influence you, and how you manage Datatron?
We did learn a lot from StartX and 500 Startup experiences which includes how to pitch to investors, how to find product/market fit, how to run sales/marketing which we don’t have experience personally before.
Datatron is a management platform for ML, AI, and Data Science models. Could you elaborate on some of the functionalities that are offered by your platform?
Our product has four modules now, Model Deployment, Model Minoring, Model Challenger and Model Governance.
Create and scale model deployments in just a few clicks. Deploy models developed in any framework or language.
Make better business decisions to save your team time and money. Monitor model performance and detect model decay as it happens.
Spend less time on model validation, bias detection, and internal audit processes. Go from model development to internal auditing to production faster than ever.
One of the use cases of Datatron is Demand Forecasting which is important for enterprises which need to plan and allocate resources. How does machine learning play into this?
Demand usually changes with both seasonality and trend, which is a typical machine learning problem. Machine learning models like ARIMA, Recurrent Neural Network (RNN) can learn from historic data to find the trend and seasonality automatically and make predictions based on that.
Which framework models (for example, TensorFlow) do you currently support?
We support most of the popular machine learning frameworks like sklearn, TensorFlow, H2O, R, SAS etc.
Which languages do models need to be built in to be supported by Datatron?
We support models in their native languages – Python, R, Java etc.
What are some of the types of industries which are best served by using the Datatron platform?
Fundamentally our platform is a horizontal solution which means it can be used by lots of different industries. As of now, we are trying to focus on financial service and telecommunication.
What are some of the most challenging aspects of data science that companies face, and why does Datatron solve this for them?
Lots of companies have different data science team already and those teams are using different tools to build their model and have different practice to manage their models. More and more enterprise realized that model is becoming an asset and will impact their top line directly. Having a platform can standardize the machine learning practice across the company becomes critical and required. Our platform can help to solve those issues.
Is there anything else that you would like to share about Datatron?
We got lots of inbound interests from big enterprises. At the same time, we are also building up our sales and marketing team to reach out to potential customers actively.
To learn more visit Datatron.
AI Engineers Develop Method That Can Detect Intent Of Those Spreading Misinformation
Dealing with misinformation in the digital age is a complex problem. Not only does misinformation have to be identified, tagged, and corrected, but the intent of those responsible for making the claim should also be distinguished. A person may unknowingly spread misinformation, or just be giving their opinion on an issue even though it is later reported as fact. Recently, a team of AI researchers and engineers at Dartmouth created a framework that can be used to derive opinion from “fake news” reports.
As ScienceDaily reports, the Dartmouth team’s study was recently published in the Journal of Experimental & Theoretical Artificial Intelligence. While previous studies have attempted to identify fake news and fight deception, this might be the first study that aimed to identify the intent of the speaker in a news piece. While a true story can be twisted into various deceptive forms, it’s important to distinguish whether or not deception was intended. The research team argues that intent matters when considering misinformation, as deception is only possible if there was intent to mislead. If an individual didn’t realize they were spreading misinformation or if they were just giving their opinion, there can’t be deception.
Eugene Santos Jr., an engineering professor at Dartmouth’s Thayer School of Engineering, explained to ScienceDaily why their model attempts to distinguish deceptive intent:
“Deceptive intent to mislead listeners on purpose poses a much larger threat than unintentional mistakes. To the best of our knowledge, our algorithm is the only method that detects deception and at the same time discriminates malicious acts from benign acts.”
In order to construct their model, the research team analyzed the features of deceptive reasoning. The resulting algorithm could distinguish intent to deceive from other forms of communication by focusing on discrepancies between a person’s past arguments and their current statements. The model constructed by the research team needs large amounts of data that can be used to measure how a person deviates from past arguments. The training data the team used to train their model consisted of data taken from a survey of opinions on controversial topics. Over 100 people gave their opinion on these controversial issues. Data was also pulled from reviews of 20 different hotels, consisting of 400 fictitious reviews and 800 real reviews.
According to Santo, the framework developed by the researchers could be refined and applied by news organizations and readers, in order to let them analyze the content of “fake news” articles. Readers could examine articles for the presence of opinions and determine for themselves if a logical argument has been used. Santos also said that the team wants to examine the impact of misinformation and the ripple effects that it has.
Popular culture often depicts non-verbal behaviors like facial expressions as indicators that someone is lying, but the authors of the study note that these behavioral hints aren’t always reliable indicators of lying. Deqing Li, co-author on the paper, explained that their research found that models based on reasoning intent are better indicators of lying than behavioral and verbal differences. Li explained that reasoning intent models “are better at distinguishing intentional lies from other types of information distortion”.
The work of the Dartmouth researchers isn’t the only recent advancement when it comes to fighting misinformation with AI. News articles with clickbait titles often mask misinformation. For example, they often imply one thing happened when another event actually occurred.
As reported by AINews, a team of researchers from both Arizona State University and Penn State University collaborated in order to create an AI that could detect clickbait. The researchers asked people to write their own clickbait headlines and also wrote a program to generate clickbait headlines. Both forms of headlines were then used to train a model that could effectively detect clickbait headlines, regardless of whether they were written by machines or people.
According to the researchers, their algorithm was around 14.5% more accurate, when it came to detecting clickbait titles than other AIs had been in the past. The lead researcher on the project and associate professor at the College of Information Sciences and Technology at Penn State, Dongwon Lee, explained how their experiment demonstrates the utility of generating data with an AI and feeding it back into a training pipeline.
“This result is quite interesting as we successfully demonstrated that machine-generated clickbait training data can be fed back into the training pipeline to train a wide variety of machine learning models to have improved performance,” explained Lee.
- Quantum Stat Releases “Big Bad NLP Database”
- Google’s CEO Calls For Increased Regulation To Avoid “Negative Consequences of AI”
- AI Ethics Principles Undergo Meta-Analysis, Human Rights Emphasized
- Computer Algorithm Can Identify Unique Dancing Characteristics
- DeepMind Discovers AI Training Technique That May Also Work In Our Brains