AI-Critical Writing and Reporting

This page is a collection of reporting, research, and essays critical of large language models (LLMs) and other “generative AI.” Articles are organized based on topic. It is an ever-growing list.

AI as a Labor Issue

This is where I think approaching “AI” as a failure becomes useful, even vital. it underscores that the technology’s real value isn’t improving productivity, or even in improving products. Rather, it’s a social mechanism employed to ensure compliance in the workplace, and to weaken worker power. Stories like the one at Zapier are becoming more common, where executive fiat is being used to force employees to use a technology that could deskill them, and make them more replaceable.

The real threat posed by generative AI is not that it will eliminate work on a mass scale, rendering human labour obsolete. It is that, left unchecked, it will continue to transform work in ways that deepen precarity, intensify surveillance, and widen existing inequalities. Technological change is not an external force to which societies must simply adapt; it is a socially and politically mediated process. Legal frameworks, collective bargaining, public investment, and democratic regulation all play decisive roles in shaping how technologies are developed and deployed, and to what ends.

Three Amazon engineers said that managers had increasingly pushed them to use A.I. in their work over the past year. The engineers said that the company had raised output goals and had become less forgiving about deadlines. […] One Amazon engineer said his team was roughly half the size it had been last year, but it was expected to produce roughly the same amount of code by using A.I.

The AI jobs crisis does not, as I’ve written before, look like sentient programs arising all around us, inexorably replacing human jobs en masse. It’s a series of management decisions being made by executives seeking to cut labor costs and consolidate control in their organizations. […]

These imperatives have always existed, of course; bosses have historically tried to maximize profits by using cost-cutting technologies. But generative AI has been uniquely powerful in equipping them with a narrative with which to do so—and to thus justify degrading, disempowering, or destroying vulnerable jobs.

In this article, Merchant points to a blog post by Jim VandeHei, CEO of Axios, in which VandeHei says that he

recently told the Axios staff that we're done sugar-coating it, and see an urgent need for every employee to turn AI into a force multiplier for their specific work. We then gave them tools to test. My exact words to a small group of our finance, legal and talent colleagues last week: “You are committing career suicide if you're not aggressively experimenting with AI.”

VandeHei adds that

We tell most staff they should be spending 10% or more of their day using AI to discover ways to double their performance by the end of the year. Some, like coders, should shoot for 10x-ing productivity as AI improves.

In other words, as Merchant puts it

The message is this: There is an AI jobs apocalypse coming, everything is going to change, and if you hope to survive it, you’re going to have to learn to be a lot more productive, for me, your boss.

Automation can augment a worker. We can call this a “centaur” – the worker offloads a repetitive task, or one that requires a high degree of vigilance, or (worst of all) both. They're a human head on a robot body (hence “centaur”). Think of the sensor/vision system in your car that beeps if you activate your turn-signal while a car is in your blind spot. You're in charge, but you're getting a second opinion from the robot.

This turns AI-“assisted” coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can't be prevented from inserting into its code.

Weaknesses of AI as a Tool

We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

However, recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution.

As any developer will tell you, this kind of silent failure is far, far worse than a crash.

We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. […]

When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%.

EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use. […]

Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels.

This paper applies the standard definition of creativity to the output of Large Language Models (LLMs) and shows not only that this can be calculated ex ante, but that LLM output creativity has a fundamental upper limit. …LLM creativity is mathematically constrained to a level equivalent to the boundary between amateur and professional human creativity. This has significant implications for claims about AI autonomy in creative tasks.

Consider this: with all you know about AI-assisted coding and its wide adoption, if I showed you charts and graphs of new software releases across the world, what shape of that graph would you expect? Surely you’d be seeing an exponential growth up-and-to-the-right as adoption took hold and people started producing more?

Now, I’ve spent a lot of money and weeks putting the data for this article together, processing tens of terabytes of data in some cases. So I hope you appreciate how utterly uninspiring and flat these charts are across every major sector of software development.

We find disconcerting trends for maintainability. Code churn -- the percentage of lines that are reverted or updated less than two weeks after being authored -- is projected to double in 2024 compared to its 2021, pre-AI baseline. We further find that the percentage of “added code” and “copy/pasted code” is increasing in proportion to “updated,” “deleted,” and “moved” code. In this regard, code generated during 2023 more resembles an itinerant contributor, prone to violate the DRY-ness of the repos visited.

Our results reveal that [Chain-of-Thought (CoT)] reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.

Cognitive scientist and AI researcher Gary Marcus notes the following about these results:

In 1998 I wrote that “universals are pervasive in language and reasoning” but showed experimentally that neural networks of that era could not reliably “extend universals outside [a] training space of examples”. […]

Throw in every gadget invented since 1998, and the Achilles’ Heel I identified then still remains. That’s startling. Even I didn’t expect that.

Experiments show leading LLM agents achieve approximately solely 58% single-turn success rate on CRMArena-Pro, with significant performance drops in multi-turn settings to 35%… Additionally, agents exhibit near-zero inherent confidentiality awareness (improvable with prompting but often at a cost to task performance).

Harassment and Spam

“Nudify” and “undress” apps are easy to use and find online and are contributing to the epidemic of explicit deepfakes among teenagers. Last month Emanuel reported that Google was promoting these apps in search results: “Google Search didn’t only lead users to these harmful apps, but was also profiting from the apps which pay to place links against specific search terms,” he wrote.

The report says that while children may recognize that AI-generating nonconsensual content is wrong they can assume “it’s legal, believing that if it were truly illegal, there wouldn’t be an app for it.” The report, which cites several 404 Media stories about this issue, notes that this normalization is in part a result of many “nudify” apps being available on the Google and Apple app stores, and that their ability to AI-generate nonconsensual nudity is openly advertised to students on Google and social media platforms like Instagram and TikTok

The news is yet another example of how the tools people have used to navigate the internet for decades are overwhelmed by the flood of AI-generated content even when they are not asking for it and which almost exclusively use people’s work or likeness without consent. At times, the deluge of AI content makes it difficult for users to differentiate between what is real and what is AI-generated.

…AI-generated content — both photos of fake plants and care misinformation — disrupt community engagement, which is what many collectors are seeking when joining these forums. […]

This kind of content is “discouraging any meaningful engagement” because it’s not grounded in reality, Caring_Cactus continues. “They’re trying to farm attention with low quality content, and it creates less opportunities for real connection by wasting people’s precious time when they want to socialize online.”

AI as an Accountability Sink

Guariglia added that defense lawyers across the country have told EFF that AI raises a major problem when it comes to the veracity of police testimony. Police reports can’t be presented as evidence alone in court, so officers must testify about what they wrote. But if AI wrote a report, and a cop’s testimony is different from that report, police will be able to blame the technology.

“If a cop is caught in a lie on the stand, it’s much easier for them to say the AI made that up as opposed to them saying you caught me lying in the report,” Guariglia said.

Another problem, he said, is that there is no way to track which part of a report was written by AI versus an officer, making it difficult to parse the document if questions are raised about its veracity.

Some U.S. police departments have begun using AI but are not disclosing its use, Guariglia said.

This kind of problem stems from a high level of complexity in the algorithmic structure, which prevents even the designers of the AI system from fully understanding how or why a specific input leads to a specific output. Without such an explanation, it would not only be difficult to dispute the validity of any recommendation provided by the system, but it may also preclude us from holding any involved actor morally responsible as they would not have access to the necessary information required for questioning the output.

AI, Ideology, and Power

We argue that, unlike systems with specific applications which can be evaluated following standard engineering principles, undefined systems like “AGI” cannot be appropriately tested for safety. Why, then, is building AGI often framed as an unquestioned goal in the field of AI? In this paper, we argue that the normative framework that motivates much of this goal is rooted in the Anglo-American eugenics tradition of the twentieth century.

But instead of appropriating performances of stereotypical white femininity and non-whitneness, today Artificial Intelligence grants people access to creativity in a legibly masculine way centered not so much in self-mastery, but the mastery of others.

Today, humanistic labor like writing, drawing, and communication has been throughly feminized, both by its association with women and by its demonetization. Prompting AI to write, draw, or communicate allows people (men) to do those things in ways that position them not as doing women’s work, but as masters of a subordinate.

What AI is is an ideology… [the] ideology itself is nothing new—it is the age-old system of supremacy, granting care and comfort to some while relegating others to servitude and penury—but the wrappings have been updated for the late capital, late digital age… Engaging with AI as a technology is to play the fool—it’s to observe the reflective surface of the thing without taking note of the way it sends roots deep down into the ground, breaking up bedrock, poisoning the soil, reaching far and wide to capture, uproot, strangle, and steal everything within its reach.

This paper draws from anthropological work on bureaucracies, states, and power, translating these ideas into a theory describing the structural tendency for powerful algorithmic systems to cause tremendous harm. I show how administrative models and projections of the world create marginalization, just as algorithmic models cause representational and allocative harm.

AI in Academia

The question is not, how do we help people be creative? They are already, from birth. The question is, why do we grind creativity out of kids so thoroughly, and how do we stop doing that? Teaching songwriting and other creative music-making requires only that you disinhibit the strong creative impulse that is already there. […]

The hardest part of making up a song is just having the nerve to do it. You have to take an emotional risk. Everything intellectual and technical is downstream from that. If you remove the emotional risk, you remove the entire foundation of the structure. It doesn’t matter what you pile on top after that.

This list has been assembled by Mike Nason, UNB Libraries, and will be supplemented whenever something is compelling enough to add.

Materials here are intended as solidarity solace for educators who might find themselves inventing wheels alone while their administrators, trustees, and bosses unrelentingly hype AI and nakedly enthuse the negative consequences for educator labor.

AGAINST AI benefits from expert colleagues across the humanistic and qualitative disciplines, as well as in various media industries.

Please contact us to contribute.

Economics of AI

Despite $30–40 billion in enterprise investment into GenAI, this report uncovers a surprising result in that 95% of organizations are getting zero return.

“The difference between the IT bubble in the 1990s and the AI bubble today is that the top 10 companies in the S&P 500 today are more overvalued than they were in the 1990s,” Slok wrote in a recent research note that was widely shared across social media and financial circles.

As we speak, the tech industry is grappling with a mid-life crisis where it desperately searches for the next hyper-growth market, eagerly pushing customers and businesses to adopt technology that nobody asked for in the hopes that they can keep the Rot Economy alive.

To be abundantly clear, as it stands, OpenAI currently spends $2.35 to make $1.

AI and Climate

AI companies clearly expect their technology to consume much, much more energy than existing cloud computing infrastructure, and they expect that consumption to continue for decades into the future. If they tell you otherwise, they’re lying, and you can tell they’re lying by the billions of dollars they’re pouring into constructing city-sized datacenters with their own multi-gigawatt power stations. Any discussion of the sustainability of AI that ignores that expenditure is made either in ignorance or bad faith.

The report finds that data centers consumed about 4.4% of total U.S. electricity in 2023 and are expected to consume approximately 6.7 to 12% of total U.S. electricity by 2028. The report indicates that total data center electricity usage climbed from 58 TWh in 2014 to 176 TWh in 2023 and estimates an increase between 325 to 580 TWh by 2028.

One might argue that 6.7% of U.S. electricity usage is a relatively small portion of contributions to climate change, as well as being from all data center usage, not just AI. However, especially given all the other negatives of LLMs and related technologies, I still do not see them as worth that level of energy consumption. See also Molly White, “AI isn't useless. But is it worth it?.”

Rhetoric of AI Boosters

…every discussion is a motte-and-bailey. If I use a free model and get a bad result I’m told it’s because I should have used the paid model. If I get a bad result with ChatGPT I should have used Claude. If I get a bad result with a chatbot I need to start using an agentic tool. If an agentic tool deletes my hard drive by putting os.system(“rm -rf ~/”) into sitecustomize.py then I guess I should have built my own MCP integration with a completely novel heretofore never even considered security sandbox or something?

Discussion Covering Multiple Facets

0

Send Me a Webmention