Keeping up with an industry as rapidly evolving as artificial intelligence is a difficult task. So, until an AI can do it for you, here’s a helpful roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on our own.
Last week, Midjourney, the image of the AI startup building (and short videos) generators, made a small change to the terms of service related to the company’s policy on IP disputes. It served mainly to replace the joke language with more legal, no doubt based on case law, clauses. But the change can also be taken as a sign of Midjourney’s belief that AI vendors like himself will emerge victorious in courtroom battles with creators whose projects include vendor training data.
Generative AI models like Midjourney’s are trained on a huge number of examples — e.g. images and text—usually sourced from public websites and repositories on the Web. Sellers claim this fair use, the legal doctrine that allows the use of copyrighted works to create a secondary creation as long as it is transformative, shields them when it comes to training models. But not all creators agree — especially in light of the growing number of studies showing that models can — and do — “report” training data.
Some vendors have taken a proactive approach, entering into licensing agreements with content creators and establishing “opt-out” systems for training datasets. Others have promised that if customers are involved in a copyright lawsuit arising from the use of a vendor’s GenAI tools, they won’t be on the hook for legal fees.
Midjourney is not one of the proactive ones.
In contrast, Midjourney has been somewhat brazen about using copyrighted works, at one point maintaining a list of thousands of artists—including illustrators and designers at major brands like Hasbro and Nintendo—whose work was or would be used to train Midjourney’s models. ONE study shows compelling evidence that Midjourney used TV shows and movie franchises in its educational data, too, from “Toy Story” to “Star Wars” to “Dune” to “Avengers.”
Now, there is a scenario in which decisions in court go to the end of Midjourney. Should the court system decide that fair use applies, there’s nothing stopping the startup from continuing as it has been, collecting and training on copyrighted data old and new.
But it seems like a risky bet.
Midjourney is flying high right now, having According to reports it reached about $200 million in revenue without a penny of outside investment. However, lawyers are expensive. And if it’s decided that fair use doesn’t apply in Midjourney’s case, it would decimate the company overnight.
No reward without risk, eh?
Here are some other notable AI stories from the past few days:
AI-assisted advertising is attracting the wrong kind of attention: Creators on Instagram lashed out at a filmmaker whose ad reused someone else’s (much more difficult and impressive) work without credit.
EU authorities alert AI platforms ahead of elections: They ask the biggest tech companies to explain their approach to preventing election fraud.
Google Deepmind wants your co-op gaming partner to be its AI: Training an agent over many hours of 3D gameplay made it capable of performing simple tasks expressed in natural language.
The problem with benchmarks: Many, many AI vendors claim that their models have met or beaten the competition by some objective metric. But the metrics they use are often wrong.
AI2 Scores $200 Million: The AI2 Incubator, which comes from the nonprofit Allen Institute for AI, has secured a $200 million windfall that startups that go through its program can tap into to accelerate early-stage growth.
India requires and then withdraws government approval for AI: The Indian government can’t seem to decide what level of regulation is appropriate for the AI industry.
Anthropic launches new models: Artificial intelligence startup Anthropic has launched a new model family, Claude 3, that competes with OpenAI’s GPT-4. We tested the flagship model (Claude 3 Opus) and found it impressive — but lacking in areas like current events.
Political deepfakes: A study by the Center for Countering Digital Hate (CCDH), a British non-profit organization, examines the growing amount of AI-generated disinformation — especially fake election-related images — on X (formerly Twitter) over the past year.
OpenAI vs. Musk: OpenAI says it plans to reject all of X CEO Elon Musk’s claims in a recent lawsuit, and suggested the billionaire entrepreneur — who co-founded the company — didn’t really have that much of an impact on OpenAI’s development and success.
Rufus Rating: Last month, Amazon announced that it would launch a new AI-powered chatbot, Rufus, inside the Amazon Shopping app for Android and iOS. We got early access — and were quickly disappointed by the lack of things Rufus can do (and do well).
More machine learning
Particles! How do they work; AI models have aided our understanding and prediction of molecular dynamics, conformation, and other aspects of the nanoscopic world that might otherwise require expensive, complex methods to test. You still need to verify, of course, but things like AlphaFold are quickly changing the field.
Microsoft has a new model called ViSNet, with the goal of predicting so-called structure-activity relationships, complex relationships between molecules and biological activity. It’s still quite experimental and definitely for researchers only, but it’s always great to see hard scientific problems tackled with cutting-edge technological means.
Researchers at the University of Manchester are looking specifically detecting and predicting variants of COVID-19less by pure structure like ViSNet and more by analyzing the very large sets of genetic data related to the evolution of the coronavirus.
“The unprecedented volume of genetic data generated during the pandemic requires improvements in our methods to analyze it thoroughly,” said lead researcher Thomas House. His colleague Roberto Cahuantzi added: “Our analysis serves as a proof of concept, demonstrating the potential use of machine learning methods as a warning tool for the early discovery of emerging large variants.”
AI can also design molecules, and several researchers have done so signed initiative asking for safety and ethics in this area. Although as David Baker (one of the world’s leading computational biophysicists) notes, “The potential benefits of protein design far outweigh the risks at this point.” Well, as a designer of AI protein designers I will they say that. However, we must be wary of regulation that makes no sense and impedes legitimate research while allowing bad actors freedom.
Atmospheric scientists at the University of Washington made an interesting claim based on AI analysis of 25 years of satellite imagery over Turkmenistan. In fact, the accepted notion that the economic turmoil after the fall of the Soviet Union led to reduced emissions may not be true — in fact, the opposite may have been the case.
“We find that the collapse of the Soviet Union appears to lead, paradoxically, to an increase in methane emissions,” said UW professor Alex Turner. The large data sets and lack of time to look at them made the subject a natural target for artificial intelligence, which led to this unexpected reversal.
Large language models are trained heavily on English source data, but this can affect more than the ability to use other languages. EPFL researchers looking at LlaMa-2’s “latent language” found that the model apparently reverts to English internally even when translating between French and Chinese. The researchers suggest, however, that there is more to this than a lazy translation process, and indeed the model has he structured his entire conceptual latent space around English concepts and representations. It matters? Probably. We should differentiate their datasets anyway.