Select Page
Synthetic Data Is a Dangerous Teacher

Synthetic Data Is a Dangerous Teacher

In April 2022, when Dall-E, a text-to-image visio-linguistic model, was released, it purportedly attracted over a million users within the first three months. This was followed by ChatGPT, in January 2023, which apparently reached 100 million monthly active users just two months after launch. Both mark notable moments in the development of generative AI, which in turn has brought forth an explosion of AI-generated content into the web. The bad news is that, in 2024, this means we will also see an explosion of fabricated, nonsensical information, mis- and disinformation, and the exacerbation of social negative stereotypes encoded in these AI models.

The AI revolution wasn’t spurred by any recent theoretical breakthrough—indeed, most of the foundational work underlying artificial neural networks has been around for decades—but by the “availability” of massive data sets. Ideally, an AI model captures a given phenomena—be it human language, cognition, or the visual world—in a way that is representative of the real phenomena as closely as possible.

For example, for a large language model (LLM) to generate humanlike text, it is important the model is fed huge volumes of data that somehow represents human language, interaction, and communication. The belief is that the larger the data set, the better it captures human affairs, in all their inherent beauty, ugliness, and even cruelty. We are in an era that is marked by an obsession to scale up models, data sets, and GPUs. Current LLMs, for instance, have now entered an era of trillion-parameter machine-learning models, which means that they require billion-sized data sets. Where can we find it? On the web.

This web-sourced data is assumed to capture “ground truth” for human communication and interaction, a proxy from which language can be modeled on. Although various researchers have now shown that online data sets are often of poor quality, tend to exacerbate negative stereotypes, and contain problematic content such as racial slurs and hateful speech, often towards marginalized groups, this hasn’t stopped the big AI companies from using such data in the race to scale up.

With generative AI, this problem is about to get a lot worse. Rather than representing the social world from input data in an objective way, these models encode and amplify social stereotypes. Indeed, recent work shows that generative models encode and reproduce racist and discriminatory attitudes toward historically marginalized identities, cultures, and languages.

It is difficult, if not impossible—even with state-of-the-art detection tools—to know for sure how much text, image, audio, and video data is being generated currently and at what pace. Stanford University researchers Hans Hanley and Zakir Durumeric estimate a 68 percent increase in the number of synthetic articles posted to Reddit and a 131 percent increase in misinformation news articles between January 1, 2022, and March 31, 2023. Boomy, an online music generator company, claims to have generated 14.5 million songs (or 14 percent of recorded music) so far. In 2021, Nvidia predicted that, by 2030, there will be more synthetic data than real data in AI models. One thing is for sure: The web is being deluged by synthetically generated data.

The worrying thing is that these vast quantities of generative AI outputs will, in turn, be used as training material for future generative AI models. As a result, in 2024, a very significant part of the training material for generative models will be synthetic data produced from generative models. Soon, we will be trapped in a recursive loop where we will be training AI models using only synthetic data produced by AI models. Most of this will be contaminated with stereotypes that will continue to amplify historical and societal inequities. Unfortunately, this will also be the data that we will use to train generative models applied to high-stake sectors including medicine, therapy, education, and law. We have yet to grapple with the disastrous consequences of this. By 2024, the generative AI explosion of content that we find so fascinating now will instead become a massive toxic dump that will come back to bite us.

Forget Growth. Optimize for Resilience

Forget Growth. Optimize for Resilience

Fleming believed that growth has natural limits. Things grow to maturity—kids into adults, saplings into trees, startups into full-fledged companies—but growth beyond that point is, in his words, a “pathology” and an “affliction.” The bigger and more productive an economy gets, he argued, the more resources it needs to burn to maintain its own infrastructure. It becomes less and less efficient at keeping any one person clothed, fed, and sheltered. He called this the “intensification paradox”: The harder everyone works to make the GDP line point up, the harder everyone has to work to make the GDP line point up. Inevitably, Fleming believed, growth will turn to degrowth, intensification to deintensification. These are things to prepare for, plan for, and the way to do that is with the missing metric: resilience.

Fleming offers several definitions of resilience, the briefest of which is “the ability of a system to cope with shock.” He describes two kinds: preventive resilience, which helps you maintain an existing state in spite of shocks, and recovery-elastic resilience, which helps you adapt quickly to a new post-shock state. Growth won’t help you with resilience, Fleming argues. Only community will. He’s big on the “informal economy”—think Craigslist and Buy Nothing, not Amazon. People helping people.

So I began to imagine, in my hypocritical heart, an analytics platform that would measure resilience in those terms. As growth shot too high, notifications would fire off to your phone: Slow down! Stop selling! Instead of revenue, it would measure relationships formed, barters fulfilled, products loaned and reused. It would reflect all sorts of non-transactional activities that make a company resilient: Is the sales team doing enough yoga? Are the office dogs getting enough pets? In the analytics meeting, we would ask questions like “Is the product cheap enough for everyone?” I even tried to sketch out a resilience funnel, where the juice that drips down is people checking in on their neighbors. It was an interesting exercise, but what I ended up imagining was basically HR software for Burning Man, which, well, I’m not sure that’s the world I want to live in either. If you come up with a good resilience funnel, let me know. Such a product would perform very badly in the marketplace (assuming you could even measure that).

The fundamental problem is that the stuff that creates resilience won’t ever show up in the analytics. Let’s say you were building a chat app. If people chat more using your app, that’s good, right? That’s community! But the really good number, from a resilience perspective, is how often they put down the app and meet up in person to hash things out. Because that will lead to someone coming by the house with lasagna when someone else has Covid, or someone giving someone’s kid an old acoustic guitar from the attic in exchange for, I don’t know, a beehive. Whole Earth stuff. You know how it works.

All of this somewhat guilty running around led me back to the simplest answer: I can’t measure resilience. I mean, sure, I could wing a bunch of vague, abstract stats and make pronouncements. God knows I’ve done a lot of that before. But there’s no metric, really, that can capture it. Which means I have to talk to strangers, politely, about problems they’re trying to solve.

I hate this conclusion. I want to push out content and see lines move and make no more small talk. I want my freaking charts. That’s why I like tech. Benchmarks, CPU speeds, hard drive sizes, bandwidth, users, point releases, revenue. I love when the number goes up. It’s almost impossible to imagine a world where it doesn’t. Or rather it used to be.


This article appears in the November 2023 issue. Subscribe now.

To Own the Future, Read Shakespeare

To Own the Future, Read Shakespeare

many times a year, as if on a hidden schedule, some tech person, often venture-capital-adjacent, types out a thought on social media like “The only thing liberal arts majors are good for is scrubbing floors while I punch them” and hits Send. Then the poetry people respond—often a little late, in need of haircuts—with earnest arguments about the value of art.

I am an English major to death. (You know us not by what we’ve read but by what we are ashamed not to have read.) But I learned years ago that there’s no benefit in joining this debate. It never resolves. The scientist-novelist C. P. Snow went after the subject in 1959 in a lecture called “The Two Cultures,” in which he criticized British society for favoring Shakespeare over Newton. Snow gets cited a lot. I have always found him unreadable, which, yes, embarrasses me but also makes me wonder whether perhaps the humanities had a point.

By the time I went to college, in the mixtape days, the Two Cultures debate had migrated to corkboards. In the liberal arts building, people tacked up pro-humanities essays they had snipped out of magazines. A hot Saturday night for me was to go and read them. Other people were trying drugs. I found the essays perplexing. I got the gist, but why would one need to defend something as urgent and essential as the humanities? Then again, across the street in the engineering building, I remember seeing bathroom graffiti that read “The value of a liberal arts degree,” with an arrow pointing to the toilet paper. I was in the engineering building because they had Silicon Graphics workstations.

Wandering between these worlds, I began to realize I was that most horrifying of things: interdisciplinary. At a time when computers were still sequestered in labs, the idea that an English major should learn to code was seen as wasteful, bordering on abusive—like teaching a monkey to smoke. How could one construct programs when one was supposed to be deconstructing texts? Yet my heart told me: All disciplines are one! We should all be in the same giant building. Advisers counseled me to keep this exceptionally quiet. Choose a major, they said. Minor in something odd if you must. But why were we even here, then? Weren’t we all—ceramic engineers and women’s studies alike—rowing together into the noosphere? No, I was told. We are not. Go to your work-study job calling alumni for donations.

So I got my degree, and off I went to live an interdisciplinary life at the intersection of liberal arts and technology, and I’m still at it, just as the people trashing the humanities are at it too. But I have come to understand my advisers. They were right to warn me off.

Because humans are primates and disciplines are our territories. A programmer sneers at the white space in Python, a sociologist rolls their eyes at a geographer, a physicist stares at the ceiling while an undergraduate, high off internet forums, explains that Buddhism anticipated quantum theory. They, we, are patrolling the borders, deciding what belongs inside, what does not. And this same battle of the disciplines, everlasting, ongoing, eternal, and exhausting, defines the internet. Is blogging journalism? Is fan fiction “real” writing? Can video games be art? (The answer is always: Of course, but not always. No one cares for that answer.)

Will Life Be Better in the Metaverse?

Will Life Be Better in the Metaverse?

Once several generations had come and gone and nothing of that sort had happened, other interpretations began to emerge. Maybe Jesus had been speaking about the afterlife and the more ethereal promises of heaven? Maybe the kingdom was merely the steady cumulation of justice and equality that humans were tasked with bringing about?

When I was growing up in the church, the popular evangelical interpretation was “inaugurated eschatology,” which held that the kingdom is both “now” and “not yet.” All the glories of heaven are still to come, and yet we can already experience a glimpse of them here on earth. It’s a somewhat inelegant interpretation, one that in hindsight feels like an attempt to have (quite literally) the best of both worlds: Believers can enjoy paradise in the present and also later in heaven. It’s this theological framework that comes to mind when I hear Zuckerberg go on about the physical world, AR, VR, and the porous borders between them. When he speaks about existing “mixed reality” technologies as an ontological pit stop on the road to a fully immersive virtual paradise, he sounds (to my ears, at least) an awful lot like the theologian George Eldon Ladd, who once wrote that heaven is “not only an eschatological gift belonging to the Age to Come; it is also a gift to be received in the old aeon.”

All technological aspirations are, when you get down to it, eschatological narratives. We occupants of the modern world believe implicitly that we are enmeshed in a story of progress that’s building toward a blinding transformation (the Singularity, the Omega Point, the descent of the True and Only Metaverse) that promises to radically alter reality as we know it. It’s a story that is as robust and as flexible as any religious prophecy. Any technological failure can be reabsorbed into the narrative, becoming yet another obstacle that technology will one day overcome.

One of the most appealing aspects of the metaverse, for me, is the promise of being delivered from the digital–­physical dualism mediated by screens and experiencing, once again, a more seamless relationship with “reality” (whatever that might be).

But maybe we are wrong to look so intently to the future for our salvation. Although I am no longer a believer myself, when I revisit Christ’s promises about the kingdom, I can’t help thinking that he was widely misunderstood. When the Pharisees asked him, point-blank, when the kingdom would arrive, he replied, “The kingdom of God is within you.” It’s a riddle that suggests this paradise does not belong to the future at all, but is rather an individual spiritual realm anyone can access, here and now. In his Confessions, Saint Augustine, sounding not unlike a Buddhist or Taoist sage, marveled at the fact that the wholeness he’d long sought in the external world was “within me the whole time.”

When you describe, Virtual, your longing to live in a digital simulation that resembles reality but is somehow better, I can’t help thinking that we have forgotten the original metaverse we already have within us—the human imagination. Reality, as we experience it, is intrinsically augmented—by our hopes and fears, our idle daydreams and our garish nightmares. This inner world, invisible and omnipresent, has given rise to all religious longings and has produced every technological and artistic wonder that has ever appeared among us. Indeed, it is the source and seed of the metaverse itself, which originated, like all inventions, as the vaporous wisp of an idea. Even now, amid the persistent, time-bound entropy of the physical world, you can access this virtual realm whenever you’d like, from anywhere in the world—no $300 headset required. It will be precisely as thrilling as you want it to be.

Faithfully,

Cloud


Be advised that CLOUD SUPPORT is experiencing higher than normal wait times and appreciates your patience.

My Kid Wants to Be an Influencer. Is That Bad?

My Kid Wants to Be an Influencer. Is That Bad?

“Whenever my 6-year-old daughter gets asked what she wants to be when she grows up, she says, ‘An influencer.’ The thought of it freaks me out. What should I do?”

—Under the Influence


Dear Under,

Your question made me think about Diana Christensen, a main character in Paddy Chayefsky’s 1976 film Network, played by Faye Dunaway. Christensen is a young network news executive who is meant to represent the moral bankruptcy of a generation that was raised on TV (one character calls her “television incarnate”). While charismatic and highly capable, she is also rampantly amoral, viciously competitive, and so obsessed with ratings that she famously has an orgasm while discussing viewership numbers. The character clearly piqued a pervasive cultural anxiety about TV’s corrupting influence, though with a little distance it’s hard not to see her depiction in the film as moralizing and heavy-handed. As The New Yorker’s Pauline Kael put it in her review, “What Chayefsky is really complaining about is what barroom philosophers have always complained about: the soulless worshippers at false shrines—the younger generation.”

I mention the film only to get out of the way the most obvious objection to your freak-out, one I’m sure you’ve already considered—namely, that every generation fears new forms of media are “false shrines” corrupting the youth, and that these concerns are ultimately myopic, reactionary, and destined to appear in hindsight as so much unfounded hand-wringing. Before Diana Christensen, there were the studio bullies in Norman Mailer’s novel The Deer Park (1955), who represented the degeneracy of Hollywood, and the ruthless newspaper men in Howard Hawks’ film His Girl Friday (1940), who are referred to as “inhuman.” If you want to go back even further, consider the bewilderment often experienced by modern readers of Mansfield Park, Jane Austen’s 1814 novel whose dramatic apex rests on a father’s outrage at coming home to find that his children have decided to put on a play.

Rest assured, Under, that I am not trying to dismiss your question through appeals to historical relativism. Pointing out that a problem has antecedents does not compromise its validity. It’s possible, after all, that humanity is on a steady downhill slide, that each new technological medium, and the professions it spawns, is progressively more soulless than the last. The many journalists who’ve cited the 2019 poll claiming that 30 percent of US and UK children want to be YouTubers when they grow up have frequently juxtaposed that figure with the dearth of kids who want to be astronauts (11 percent), as though to underscore the declining ambitions of a society that is no longer “reaching for the stars” but aiming instead for the more lowly consolations of stardom.

If I were to guess your objections to influencing as a future occupation for your daughter, I imagine they might include the fact that the profession, for all its vaunted democratic appeal—anyone can be famous!—conceals its competitive hierarchies; that its spoils are unreliable and largely concentrated at the top; that it requires becoming a vapid mascot for brands; that it fails to demand meaningful contributions to one’s community; that it requires a blurring between personal and professional roles; that the mandates of likes, shares, and followers amount to a life of frenetic people-pleasing and social conformity that inevitably destroys one’s capacity for independent thinking.

I’m also willing to bet there is a deeper fear humming beneath those seemingly rational objections—one that is related, incidentally, to the very notion of influence. Parenting is, at the end of the day, an extended experiment in influencing. You hope to instill your values, politics, and moral and ethical awareness in your children, yet as they make their way into the world, it becomes clear that there are other influences at war with your own. Influence, it has been noted in this era of epidemics, shares a root word with influenza, an etymology that echoes the popular notion that ideas are free-floating pathogens that someone can catch without giving their conscious consent. I think this is how many parents regard the social technologies their children use, as hosts for various contagions that must be staved off with more deliberate moral instruction given at home. To realize the extent to which these digital platforms have fascinated your daughter is to feel that you have failed to inoculate her.

Or maybe your uneasiness goes even deeper than that. If I can turn the problem back on you, perhaps your instinctive aversion to your daughter’s aspirations has raised more probing questions about the source and validity of your own values. Any serious attempt to think through the perils and possibilities of new technologies forces you to realize that many of your own beliefs are little more than amorphous, untested assumptions, formed by the era in which you were raised. Are the artists you grew up idolizing—musicians, filmmakers, novelists—any less shallow and narcissistic than the TikTok and YouTube personalities your daughter idolizes? The answer to this question is not a given. But if you consider it honestly and persistently, I suspect you will discover that you are not an isolated moral agent but porous to the biases and blind spots of the decades in which you came of age.

Such realizations can easily inspire fatalism, but they can also lead to a more expansive and meaningful understanding of your own fears. My intent in reminding you of the anxieties of previous generations—all that collective angst about television, movies, newspapers, and theater—is to help you see your situation as part of a lineage, a rite of passage through which all generations must proceed. (If we are to believe Plato’s Phaedrus, even Socrates fell prey to griping about the popularity of writing, a medium he feared would “produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory.”) To see this problem historically might also prompt you to consider, as a parent, what kinds of life lessons transcend the particulars of a given economy.

I would like to believe that alongside all the ephemeral inherited assumptions we absorb in our youth, there are some pearls of enduring wisdom that will remain true and valuable for generations to come. Ideally, it’s these more lasting truths that you want to pass down to your daughter, and that will equip her to have an influence, no matter what she chooses for work.

Faithfully, 

Cloud


Be advised that CLOUD SUPPORT is experiencing higher than normal wait times and appreciates your patience.