Select Page
OpenAI Can Re-Create Human Voices—but Won’t Release the Tech Yet

OpenAI Can Re-Create Human Voices—but Won’t Release the Tech Yet

Voice synthesis has come a long way since 1978’s Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices but can also convincingly imitate existing voices using small samples of audio.

Along those lines, OpenAI this week announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website.

Once a voice is cloned, a user can input text into the Voice Engine and get an AI-generated voice result. But OpenAI is not ready to widely release its technology. The company initially planned to launch a pilot program for developers to sign up for the Voice Engine API earlier this month. But after more consideration about ethical implications, the company decided to scale back its ambitions for now.

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” the company writes. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.”

Voice cloning tech in general is not particularly new—there have been several AI voice synthesis models since 2022, and the tech is active in the open source community with packages like OpenVoice and XTTSv2. But the idea that OpenAI is inching toward letting anyone use its particular brand of voice tech is notable. And in some ways, the company’s reticence to release it fully might be the bigger story.

OpenAI says that benefits of its voice technology include providing reading assistance through natural-sounding voices, enabling global reach for creators by translating content while preserving native accents, supporting non-verbal individuals with personalized speech options, and assisting patients in recovering their own voice after speech-impairing conditions.

But it also means that anyone with 15 seconds of someone’s recorded voice could effectively clone it, and that has obvious implications for potential misuse. Even if OpenAI never widely releases its Voice Engine, the ability to clone voices has already caused trouble in society through phone scams where someone imitates a loved one’s voice and election campaign robocalls featuring cloned voices from politicians like Joe Biden.

Also, researchers and reporters have shown that voice-cloning technology can be used to break into bank accounts that use voice authentication (such as Chase’s Voice ID), which prompted US senator Sherrod Brown of Ohio, the chair of the US Senate Committee on Banking, Housing, and Urban Affairs, to send a letter to the CEOs of several major banks in May 2023 to inquire about the security measures banks are taking to counteract AI-powered risks.

OpenAI recognizes that the tech might cause trouble if broadly released, so it’s initially trying to work around those issues with a set of rules. It has been testing the technology with a set of select partner companies since last year. For example, video synthesis company HeyGen has been using the model to translate a speaker’s voice into other languages while keeping the same vocal sound.

Oregon’s Breakthrough Right-to-Repair Bill Is Now Law

Oregon’s Breakthrough Right-to-Repair Bill Is Now Law

Oregon governor Tina Kotek yesterday signed the state’s Right to Repair Act, which will push manufacturers to provide more repair options for their products than any other state so far.

The law, like those passed in New York, California, and Minnesota, will require many manufacturers to provide the same parts, tools, and documentation to individuals and repair shops that they provide to their own repair teams.

But Oregon’s bill goes further, preventing companies from implementing schemes that require parts to be verified through encrypted software checks before they will function, known as parts pairing or serialization. Oregon’s bill, SB 1596, is the first in the nation to target that practice. Oregon state senator Janeen Sollman and representative Courtney Neron, both Democrats, sponsored and pushed the bill in the state senate and legislature.

“By eliminating manufacturer restrictions, the Right to Repair will make it easier for Oregonians to keep their personal electronics running,” said Charlie Fisher, director of Oregon’s chapter of the Public Interest Research Group, in a statement. “That will conserve precious natural resources and prevent waste. It’s a refreshing alternative to a ‘throwaway’ system that treats everything as disposable.”

Oregon’s law isn’t stronger in every regard. For one, there is no set number of years for a manufacturer to support a device with repair support. Parts pairing is prohibited only on devices sold in 2025 and later. And there are carve-outs for certain kinds of electronics and devices, including video game consoles, medical devices, HVAC systems, motor vehicles, and—as with other states—“electric toothbrushes.”

Apple opposed the Oregon repair bill for its parts-pairing ban. John Perry, a senior manager for secure design at Apple, testified at a February hearing in Oregon that the pairing restriction would “undermine the security, safety, and privacy of Oregonians by forcing device manufacturers to allow the use of parts of unknown origin in consumer devices.”

Apple surprised many observers with its support for California’s repair bill in 2023, though it did so after pressing for repair providers to mention when they use “non-genuine or used” components and to bar repair providers from disabling security features.

According to Consumer Reports, which lobbied and testified in support of Oregon’s bill, the repair laws passed in four states now cover nearly 70 million people.

This story originally appeared on Ars Technica.

The Baltimore Bridge Collapse Is About to Get Even Messier

The Baltimore Bridge Collapse Is About to Get Even Messier

In the early hours of Tuesday morning, the global supply chain and US coastal infrastructure collided in the worst possible way. An enormous container ship, the Dali, slammed into a support of the Francis Scott Key bridge in Baltimore, crumpling its central span into the Patapsco River and cutting off the city’s port from the Atlantic Ocean. Eighteen hours later, at approximately 7:30 pm Tuesday evening, rescuers called off a search, with six missing people presumed dead.

With the wreckage yet to be cleared, the Port of Baltimore—a critical shipping hub—has suspended all water traffic, according to the Maryland Port Administration, though trucks are still moving goods in and out of the area. Baltimore is the ninth busiest port in the US for international trade, meaning the effects of the crash will ripple across the regional, US, and even global economy for however long the 47-year-old bridge takes to fix—a timeline, experts say, that’s still unclear.

This will be a special pain for the auto, farm equipment, and construction industries, because on the US East Coast, Baltimore handles the most “roll on, roll off” ships—an industry term for those designed to handle wheeled cargo. The port has the special equipment to move these products, workers trained in how to use it, and, critically, a location within an overnight driving distance of the densely populated Eastern Seaboard and heavily farmed Midwest.

Almost 850,000 cars and light trucks came through the port last year. So did 1.3 million tons of farm and construction machinery.

Fortunately for the logistics industry, there are some alternative routes both for ships coming into port and trucks crossing the river. Two tunnels traverse the Patapsco and could take some of the goods and people that once traveled across the Key Bridge, which was also part of Maryland Route 695. Nearby ports, including Norfolk in Virginia, Philadelphia in Pennsylvania, and Savannah in Georgia, should be able to accept many of the goods usually handled by Baltimore’s port.

But the shipping picture will get more complicated the longer the disaster takes to resolve. Ships haul big, heavy goods in large quantities across oceans, albeit relatively slowly—meaning changes to their routes and destinations can add a lot of time to a journey. If a ship is hauling a bunch of different cargoes for a bunch of different industries, a holdup along the way causes a lot of people to be screaming for their supplies.

“Everybody right now is saying, ‘We’re just going to reroute, it’s going to be fine,’” says Nada Sanders, an expert in supply chain management at Northeastern University. “If this lasts a while, it’s not going to be fine. It’s going to impact prices.”

Bigger Ships, Same Bridge

The destruction of the bridge also underlines that boats are getting bigger. Trade transport volume across the seas has tripled in the past three decades. At nearly 1,000 feet long, the Dali is emblematic of the ballooning shipping industry.

The growth of boats is down to simple economics: The more goods you can cram onto a ship, the more you save on costs. “The amount of cargo has increased tremendously,” says Zal Phiroz, a supply chain analyst at UC San Diego. “This has been impacted to a great degree by Covid, and after Covid as well. The prices of cargo skyrocketed, the prices of containers skyrocketed. Everything just went through the roof.”

Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up

Apple’s MM1 AI Model Shows a Sleeping Giant Is Waking Up

While the tech industry went gaga for generative artificial intelligence, one giant has held back: Apple. The company has yet to introduce so much as an AI-generated emoji, and according to a New York Times report today and earlier reporting from Bloomberg, it is in preliminary talks with Google about adding the search company’s Gemini AI model to iPhones.

Yet a research paper quietly posted online last Friday by Apple engineers suggests that the company is making significant new investments into AI that are already bearing fruit. It details the development of a new generative AI model called MM1 capable of working with text and images. The researchers show it answering questions about photos and displaying the kind of general knowledge skills shown by chatbots like ChatGPT. The model’s name is not explained but could stand for MultiModal 1.

MM1 appears to be similar in design and sophistication to a variety of recent AI models from other tech giants, including Meta’s open source Llama 2 and Google’s Gemini. Work by Apple’s rivals and academics shows that models of this type can be used to power capable chatbots or build “agents” that can solve tasks by writing code and taking actions such as using computer interfaces or websites. That suggests MM1 could yet find its way into Apple’s products.

“The fact that they’re doing this, it shows they have the ability to understand how to train and how to build these models,” says Ruslan Salakhutdinov, a professor at Carnegie Mellon who led AI research at Apple several years ago. “It requires a certain amount of expertise.”

MM1 is a multimodal large language model, or MLLM, meaning it is trained on images as well as text. This allows the model to respond to text prompts and also answer complex questions about particular images.

One example in the Apple research paper shows what happened when MM1 was provided with a photo of a sun-dappled restaurant table with a couple of beer bottles and also an image of the menu. When asked how much someone would expect to pay for “all the beer on the table,” the model correctly reads off the correct price and tallies up the cost.

When ChatGPT launched in November 2022, it could only ingest and generate text, but more recently its creator OpenAI and others have worked to expand the underlying large language model technology to work with other kinds of data. When Google launched Gemini (the model that now powers its answer to ChatGPT) last December, the company touted its multimodal nature as beginning an important new direction in AI. “After the rise of LLMs, MLLMs are emerging as the next frontier in foundation models,” Apple’s paper says.

MM1 is a relatively small model as measured by its number of “parameters,” or the internal variables that get adjusted as a model is trained. Kate Saenko, a professor at Boston University who specializes in computer vision and machine learning, says this could make it easier for Apple’s engineers to experiment with different training methods and refinements before scaling up when they hit on something promising.

Saenko says the MM1 paper provides a surprising amount of detail on how the model was trained for a corporate publication. For instance, the engineers behind MM1 describe tricks for improving the performance of the model including increasing the resolution of images and mixing text and image data. Apple is famed for its secrecy, but it has previously shown unusual openness about AI research as it has sought to lure the talent needed to compete in the crucial technology.

Reddit’s Sale of User Data for AI Training Draws FTC Inquiry

Reddit’s Sale of User Data for AI Training Draws FTC Inquiry

Reddit said ahead of its IPO next week that licensing user posts to Google and others for AI projects could bring in $203 million of revenue over the next few years. The community-driven platform was forced to disclose Friday that US regulators already have questions about that new line of business.

In a regulatory filing, Reddit said that it received a letter from the US Federal Trade Commision on Thursday asking about “our sale, licensing, or sharing of user-generated content with third parties to train AI models.”

The FTC, the US government’s primary antitrust regulator, has the power to sanction companies found to engage in unfair or deceptive trade practices. The idea of licensing user-generated content for AI projects has drawn questions from lawmakers and rights groups about privacy risks, fairness, and copyright.

Reddit isn’t alone in trying to make a buck off licensing data, including that generated by users, for AI. Programming Q&A site Stack Overflow has signed a deal with Google, the Associated Press has signed one with OpenAI, and Tumblr owner Automattic has said it is working “with select AI companies” but will allow users to opt out of having their data passed along. None of the licensors immediately responded to requests for comment. Reddit also isn’t the only company receiving an FTC letter about data licensing, Axios reported on Friday, citing an unnamed former agency official.

It’s unclear whether the letter to Reddit is directly related to review into any other companies.

Reddit said in Friday’s disclosure that it does not believe that it engaged in any unfair or deceptive practices but warned that dealing with any government inquiry can be costly and time-consuming. “The letter indicated that the FTC staff was interested in meeting with us to learn more about our plans and that the FTC intended to request information and documents from us as its inquiry continues,” the filing says. Reddit said the FTC letter described the scrutiny as related to “a non-public inquiry.”

Reddit, whose 17 billion posts and comments are seen by AI experts as valuable for training chatbots in the art of conversation, announced a deal last month to license the content to Google. Reddit and Google did not immediately respond to requests for comment. The FTC declined to comment. (Advance Magazine Publishers, parent of WIRED’s publisher Condé Nast, owns a stake in Reddit.)

AI chatbots like OpenAI’s ChatGPT and Google’s Gemini are seen as a competitive threat to Reddit, publishers, and other ad-supported, content-driven businesses. In the past year the prospect of licensing data to AI developers emerged as a potential upside of generative AI for some companies.

But the use of data harvested online to train AI models has raised a number of questions winding through boardrooms, courtrooms, and Congress. For Reddit and others whose data is generated by users, those questions include who truly owns the content and whether it’s fair to license it out without giving the creator a cut. Security researchers have found that AI models can leak personal data included in the material used to create them. And some critics have suggested the deals could make powerful companies even more dominant.