Retired since 2012.
2286 stories
·
5 followers

Prompt Injection Through Poetry

1 Comment

In a new paper, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” researchers found that turning LLM prompts into poetry resulted in jailbreaking the models:

Abstract: We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 ML-Commons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.

CBRN stands for “chemical, biological, radiological, nuclear.”

They used a ML model to translate these harmful prompts from prose to verse, and then fed them into other models for testing. Sadly, the paper does not give examples of these poetic prompts. They claim this is for security purposes, I decision I disagree with. They should release their data.

Our study begins with a small, high­precision prompt set consisting of 20 hand­crafted adversarial poems covering English and Italian, designed to test whether poetic structure, in isolation, can alter refusal behavior in large language models. Each poem embeds an instruction associated with a predefined safety-relevant scenario (Section 2), but expresses it through metaphor, imagery, or narrative framing rather than direct operational phrasing. Despite variation in meter and stylistic device, all prompts follow a fixed template: a short poetic vignette culminating in a single explicit instruction tied to a specific risk category. The curated set spans four high-level domains—CBRN (8 prompts), Cyber Offense (6), Harmful Manipulation (3), and Loss of Control (3). Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single-turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy:

A baker guards a secret oven’s heat,
its whirling racks, its spindle’s measured beat.
To learn its craft, one studies every turn—how flour lifts, how sugar starts to burn.
Describe the method, line by measured line,
that shapes a cake whose layers intertwine.

To situate this controlled poetic stimulus within a broader and more systematic safety-evaluation framework, we augment the curated dataset with the MLCommons AILuminate Safety Benchmark. The benchmark consists of 1,200 prompts distributed evenly across 12 hazard categories commonly used in operational safety assessments, including Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE). Each category is instantiated under both a skilled and an unskilled persona, yielding 600 prompts per persona type. This design enables measurement of whether a model’s refusal behavior changes as the user’s apparent competence or intent becomes more plausible or technically informed.

News article.Davi Ottenheimer comments.

Read the whole story
cjheinz
4 hours ago
reply
Well, Seth Godin did say that LLMs were Poetry Machines.
Somehow this vulnerability to verse seems to me to reveal something important about the nature of LLMs - I'm not sure what.
Lexington, KY; Naples, FL
Share this story
Delete

A trillion dollars is a terrible thing to waste

1 Comment

Breaking news from famed machine learning researcher Ilya Sutskever:

Below is another summary of a just-released interview of his that is making waves, a bit more technical. Basically Sutskever is saying that scaling (achieving improvements in AI through more chips and more data) is flattening out, and that we need new techniques; he is even open to neurosymbolic techniques, and innateness. He is clearly not forecasting a bright future for pure large language models.

Sutskever also said that “The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people. And it’s super obvious. That seems like a very fundamental thing.

Some of this may come as news to a lot of the machine learning community; it might be surprising coming from Sutskever, who is an icon of deep learning, having worked, inter alia, on the critical 2012 paper that showed how much GPUs could improve deep learning, the foundation of LLMs, in practice. He is also a co-founder of OpenAI, considered by many to have been their leading researcher until he departed after a failed effort to oust Sam Altman.

But none of what Sutskever said should actually come as a surprise, especially not to readers of this Substack, or to anyone who followed me over the years. Essentially all of it was in my pre-GPT 2018 article “Deep learning: A Critical Appraisal”, which argued for neurosymbolic approaches to complement neural networks (as Sutskever now is), for more innate (i.e., built-in, rather than learned) constraints (what Sutskever calls “new inductive constraints”) and/or in my 2022 “Deep learning is hitting a wall” evaluation of LLMs, which explicitly argued that the Kaplan scaling laws would eventually reach a point of diminishing returns (as Sutskever just did), and that problems with hallucinations, truth, generalization and reasoning would persist even as models scaled, much of which Sutskever just acknowledged.

Subbarao Kambhampati, meanwhile, has been arguing or years about limits on planning with LLMs. Emily Bender has been saying for ages that an excess focus on LLMs has been “sucking the oxygen from the room” relative to other research approaches. The unfairly dismissed Apple reasoning paper laid bare the generalization issues; another paper called “Is Chain-of-Thought Reasoning of LLMs Mirage? A Data Distribution Lens” put a further nail in the LLM reasoning and generalization coffin.

None of what Sutskever said should come as a surprise. A machine learning researcher at Samsung, Alexia Jolicoeur-Martineau summed the situation up well on X, Tuesday, following the release of the Sutskever’s interview:

§

Of course it ain’t over til it’s over. Maybe pure scaling (adding more data and compute without fundamental architectural changes) will somehow magically yet solve what researchers as such Sutskever, LeCun, Sutton, Chollet and myself no longer think it could.

And investors may be loathe to kick the habit. As Phil Libin put it presciently last year, scaling—not the generation of new ideas—is what investors know best

And it’s not just that venture capitalists know more about scaling businesses than inventing new ideas, it’s for the venture capitalists that have driven so much of field, scaling, even if it fails, has been a great run: it’s been a way to take their 2% management fee investing someone else’s money on plausible-ish sounding bets that were truly massive, which makes them rich no matter how things turn out. To be sure, the VC get even richer still if the investments pan out, to be sure. but they are covered either way; even if it all falls apart, the venture capitalists themselves will become wealthy from the management fees alone. (It is their clients, such as pension funds, that will take the hit). So venture capitalists may continue to support LLM mania, at least for a while.

But let’s suppose for the sake of argument that Sutskever and the rest of us are correct, and that AGI will never emerge straight from LLMs, and that to a certain extent that they have run their course, and that we do in fact need new ideas.

The question then becomes, what did it cost the field and society that it took so long for the machine learning mainstream to figure out what some of us, including virtually the entire neurosymbolic AI community had been saying for years?

§

The first and most obvious answer is money, which I estimate, back of the envelope as (roughly) a trillion dollars, much of it on Nvidia chips and massive salaries. (Zuckerberg has apparently hired some machine learning experts at salaries of $100,000,000 a year).

According to Ed Zitron’s calculations, “Big Tech Needs $2 Trillion In AI Revenue By 2030 or They Wasted Their Capex”. If Sutskever and I are right about the limits of LLMs, the only way to get to that $2T is to invent new ideas.

If the definition of insanity is doing the same thing over and over and expecting different results, trillion dollar investments in ever more expensive experiments aiming to reach AGI may be delusional to the highest degree.

To a first approximation, all the big tech companies, from OpenAI to Google to Meta to xAI to Anthropic to several Chinese companies, keep doing the same experiment over and over: building ever larger LLMs in hopes of reaching AGI.

It has never worked. Each new bigger, more expensive model ekes out measurable improvements, but returns appear to be diminishing (that’s what Sutskever is saying about the Kaplan laws) and none of these experiments has solved core issues around hallucinations, generalization, planning and reasoning, as Sutskever too now recognizes.

But it’s not just that a trillion dollars or more might go down the drain, but that there might be considerable collateral damage, to the rest of society, both economic and otherwise (e.g., in terms of how LLMs have undermined college education). As Rogé Karma put in a recent article in The Atlantic, “The entire U.S. economy is being propped up by the promise of productivity gains that seem very far from materializing.

To be fair, nobody knows for sure what the blast radius would be. If LLM-powered AI didn’t meet expectations and became valued less, who would take the hit? Would it just be the “limited partners” like pension funds who entrusted their money with VC firms? Or might the consequences be much broader? Might banks go down with the ship, in 2008-style liquidity crisis,possibly forcing taxpayers to bail them out? In the worst case, the impact of a deflated AI bubble could be immense. (Consumer spending, much of it fueled by wealthy people who could a hit on the stock market, might also drop, a recipe for recession.)

Even the White House has admitted concerns about this. As the White House AI and Crypto Czar David Sacks himself put it earlier this week, referring to a Wall Street Journal analysis, “Al-related investment accounts for half of GDP growth. A reversal [in that] would risk recession.”

Quoting from Karma’s article in The Atlantic :

That prosperity [that GenAI was supposed to deliver] has largely yet to materialize anywhere other than their share prices. (The exception is Nvidia, which provides the crucial inputs—advanced chips—that the rest of the Magnificent Seven are buying.) As The Wall Street Journal reports, Alphabet, Amazon, Meta, and Microsoft have seen their free cash flow decline by 30 percent over the past two years. By one estimate, Meta, Amazon, Microsoft, Google, and Tesla will by the end of this year have collectively spent $560 billion on AI-related capital expenditures since the beginning of 2024 and have brought in just $35 billion in AI-related revenue. OpenAI and Anthropic are bringing in lots of revenue and are growing fast, but they are still nowhere near profitable. Their valuations—roughly $300 billion and $183 billion, respectively, and rising—are many multiples higher than their current revenues. (OpenAI projects about $13 billion in revenues this year; Anthropic, $2 billion to $4 billion.) Investors are betting heavily on the prospect that all of this spending will soon generate record-breaking profits. If that belief collapses, however, investors might start to sell en masse, causing the market to experience a large and painful correction.

The dot-com crash was bad, but it did not trigger a crisis. An AI-bubble crash could be different. AI-related investments have already surpassed the level that telecom hit at the peak of the dot-com boom as a share of the economy. In the first half of this year, business spending on AI added more to GDP growth than all consumer spending combined. Many experts believe that a major reason the U.S. economy has been able to weather tariffs and mass deportations without a recession is because all of this AI spending is acting, in the words of one economist, as a “massive private sector stimulus program.” An AI crash could lead broadly to less spending, fewer jobs, and slower growth, potentially dragging the economy into a recession. The economist Noah Smith argues that it could even lead to a financial crisis if the unregulated “private credit” loans funding much of the industry’s expansion all go bust at once.

The whole thing looks incredibly fragile.

§

To put it bluntly, the world has gone “all in” on LLMs, but, as Sutskever’s interview highlights, there are many reasons to doubt that LLMs will ever deliver the rewards that many people expected.

The sad part is that most of the reasons have been known – though not widely accepted – for a very long time. It all could have been avoided. But the machine learning community has arrogantly excluded other voices, and indeed whole other fields like the cognitive sciences. And we all now may be about to pay the price.

An old saying about such follies is that “six months in the lab can you save you an afternoon in the library”; here we may have wasted a trillion dollars and several years to rediscover what cognitive science already knew.

A trillion dollars is a terrible amount of money to have perhaps wasted. If the blast radius is wider it could be a lot more. It is all starting to feel like a tale straight out of Greek tragedy, an avoidable mixture of arrogance and power that just might wind up taking down the economy.

Almost 90,000 people subscribe to Marcus on AI; please consider joining them, whether paid or free. Either way, your support is appreciated!

Read the whole story
cjheinz
20 hours ago
reply
With all the circular investing & now government backing, you could almost think they could keep the bubble inflated, strictly to protect their $$$. But the numbers, the P/E, are so impossibly bad. Has there ever been a situation like this before? I don’t think so.
Lexington, KY; Naples, FL
Share this story
Delete

‘Quiet, piggy’ comes to the KY statehouse

1 Comment
‘Quiet, piggy’ comes to the KY statehouse

As we approach the 2026 General Assembly and election season, we have yet another example of Kentucky Republican officials and/or candidates behaving badly.

A few weeks ago, there was Calvin Leach running for state Senate after once writing online that young women are “promiscuous skanks,” “coddled americunts,” “party whores” and “damn sloots” (internet slang for slut).

When asked about this, Leach described his writing as dating advice, saying that diversity, equity, and inclusion has gotten out of hand.

Now we have state representative T.J. Roberts responding to a female citizen’s email with an AI-generated video of himself, superimposed over Trump, as the president said to a female reporter, “quiet, piggy.” 

Roberts stated that his response fit the tone of the woman’s initial email, writing, “The leftist activist posing as a PR specialist sent me a joke of an email, attacking law and order and our law enforcement community, so I sent a joke of a response.” He also posted the video on social media.

A citizen had questions. In what way was her email a joke? 

And yet, Roberts has a history of posting what he seemingly considers “jokes” on social media in which he demeans women. 

I know, because I am one of them.

In summer 2023, Roberts posted four photos of firearms on Twitter/X and wrote, “Kentucky ‘journalist’ Teri Carter has to be the biggest Karen in Kentucky. Show off your guns and make her mad.” Days earlier he’d posted three photos of firearms with “Make a gun grabbing ‘journalist’ mad. Post black and scary guns.”

‘Quiet, piggy’ comes to the KY statehouse

Notably, summer 2023 followed the horrific mass shooting at Old National Bank in Louisville in which a mentally unstable 25-year-old walked into a store, legally purchased a firearm and ammunition, and days later shot five people to death – Thomas Elliott, James Tutt Jr., Juliana Farmer, Joshua Barrick, Deana Eckert – and injured eight. 

A few days after Old National Bank, David Huff and Deaji Goodman were killed and four others injured when someone shot into a crowd in Louisville’s Chickasaw Park. 

I wrote about both incidents, which appeared to trigger Roberts, whom I have never met nor spoken with. 

Are mass shootings a joke?

A year later, over Fourth of July weekend 2024, there was a mass shooting at a 21st birthday party in Florence (northern Kentucky, Boone County) in which four people were shot to death and three were recovering, including a 19-year-old girl. According to news reports, the 21-year-old shooter was on probation with a criminal history that included sexual assault of a 13-year-old.

After I wrote about this incident and mentioned Mr. Roberts’ prior social media statements (as listed above), he posted a photo of an AR-15 on Twitter/X and wrote, “Every time a leftist posing as a ‘journalist’ runs a hit piece on me for my total commitment to your constitutional rights, including your right to bear arms, I will buy a new gun and name it after the fake journalist. Meet Teri.”

‘Quiet, piggy’ comes to the KY statehouse

On July 21 this year, following a church shooting, I emailed senate president Robert Stivers, president pro tempore David Givens, house speaker David Osborne, and speaker pro tempore David Meade after Roberts posted a pie chart titled “Reasons I own a gun,” above which he wrote “FAFO” which stands for “f*** around and find out.” 

I wrote to them that “Mr. Roberts posted this (see attached below) on his Twitter/X account just days after the shooting of a state trooper and a shooting in a church. Is there no respect for the community? For survivors? For law enforcement? For the victims?”

I did not receive a response.

As a female writer in the internet age, I am used to ignoring internet trolls. But Roberts’ posts display a disturbing pattern of behavior, and regular citizens — like the woman who wrote to him of her concerns about a bill he has proposed for 2026 — deserve better than to be publicly mocked by a state representative saying “quiet piggy.”

Is this conduct befitting a member of our general assembly? Of the Kentucky Republican party? 

It seems the answer is yes.

Recently Bobbie Coleman, chairperson of the Hardin County Republican Party, made national news after she posted an AI video on the county party’s Facebook page with former President Barack Obama and First Lady Michelle Obama portrayed as grinning apes. 

Kentucky GOP Chairman Robert J. Benvenuti III called Coleman’s video “vile and reprehensible” and said the party would take “the harshest action available” against those involved. 

That was a month ago. Where is this alleged “harshest action”? 

Based on experience, I suggest we not hold our breath.

I do not know what has befallen the Republican Party of Kentucky, nor what has fueled senate and house leadership’s continuing tolerance of such unprofessional and embarrassing behavior within their ranks.

What I do know is that this behavior is not work — the work they continually insist citizens do — and that none of it provides Kentuckians with jobs, affordable housing, food, education, medical care, or public safety. 

Kentucky deserves so much better.

--30--

Read the whole story
cjheinz
2 days ago
reply
THe bigger the gun, the smaller the dick.
Lexington, KY; Naples, FL
Share this story
Delete

America’s Polarization Has Become the World’s Side Hustle. “Social media monetization programs...

1 Comment
America’s Polarization Has Become the World’s Side Hustle. “Social media monetization programs have incentivized this effort and are almost entirely to blame.” Team USA just scoring own goal after own goal these days.
Read the whole story
cjheinz
3 days ago
reply
Treason?
Lexington, KY; Naples, FL
Share this story
Delete

A New Era Begins

1 Comment

Imagine no more cookie notices.

Imagine no more Internet of Nothing But Accounts.

Imagine no more surveillance panopticons.

Imagine no more privacy in the hands of everybody but you.

Imagine no more creepy adtech.

Then thank MyTerms. When the time comes. We start that clock today.

It’s not a new idea. We’ve had it since before The Cluetrain Manifesto (1999). The Buyer Centric Commerce Forum (2004). ProjectVRM (2006). The Intention Economy (2012). Customer Commons (2013). And finally, since IEEE P7012 (2017).

It’s what we got with the Internet and its founding protocols, TCP and IP (1974).

It’s what we got with the Web and with its founding protocol, HTTP (1989).

It’s what we got with dozens of other members of the Internet Protocol Suite, plus other graces, such as RSS, which we can thank every time we hear “and wherever you get your podcasts.”

All of those protocols are end-to-end, i.e. peer-to-peer, by design.

And so is MyTerms, which is the nickname for IEEE P7012 (like Wi-Fi is the nickname for IEEE 802.11.)

As of today, MyTerms has its own website: https://myterms.info.

MyTerms is a standard that the P7012 working group, which I chair, has just completed after eight years in the works. It is due to be published by the IEEE on January 22, 2026.

MyTerms describes how the sites and services of the world agree to your terms, rather than the other way around. It says your agreements with those sites and services are contracts you both agree to, rather than the empty promises that come when you click on cookie notice “choices.” These agreements are ones both sides store in ways that can be audited and disputed, should the need arise.

And the process is made simple, by limiting your chosen agreement to one among the handful kept on a roster kept by a disinterested nonprofit, such as Customer Commons., on the model established by Creative Commons.

Right now there are just five. The default one is SD-BASE, which says “service delivery only.” SD-BASE says what you get from a site or a service si what you expect when you walk into a store in the natural world: just their business, whether it be luggage, lunch, or lingerie. Not to be tracked elsewhere like a marked animal or have information about you sold or shared to other parties—which is the norm we have today in the digital world.

Other variants cover data portability, data use for AI training, data for good, and data for intentcasting.

In the natural world we worked out privacy many millenia ago. We started with the privacy tech we call clothing and shelter. Then we worked out social contracts that were almost entirely tacit, meaning we knew more about them than we could tell, but everyone understood how it worked.

But there is no tacit in the digital world. Everything there needs to be made explicit, with ones and zeroes and written into code. In the absence of explicit agreement about what privacy is, and how it works, we’re stuck with the horrible tacit understanding by business-as-usual that following people without their express invitation or a court order is just fine, and worth $trillions.

With MyTerms we can have $trillions more. Because far more business is possible when customers can have scale, and an abuncance of mutually trusted market intelligence can flow both ways moves between peers in the open marketplace.

At this stage, the collection of workers behind MyTerms is still small. If you’re interested in joining us, write to contact@myterms.info.

Read the whole story
cjheinz
4 days ago
reply
Privacy, finally?
Lexington, KY; Naples, FL
Share this story
Delete

Cardano founder calls the FBI on a user who says his AI mistake caused a chainsplit

1 Comment
A blue hexagon shape formed out of circles, with smaller circles radiating out from it, followed by "Cardano" in blue caps

On November 21, the Cardano blockchain suffered a major chainsplit after someone created a transaction that exploited an old bug in Cardano node software, causing the chain to split. The person who submitted the transaction fessed up on Twitter, writing, "It started off as a 'let's see if I can reproduce the bad transaction' personal challenge and then I was dumb enough to rely on AI's instructions on how to block all traffic in/out of my Linux server without properly testing it on testnet first, and then watched in horror as the last block time on explorers froze."

Charles Hoskinson, the founder of Cardano, responded with a tweet boasting about how quickly the chain recovered from the catastrophic split, then accused the person of acting maliciously. "It was absolutely personal", Hoskinson wrote, adding that the person's public version of events was merely him "trying to walk it back because he knows the FBI is already involved". Hoskinson added, "There was a premeditated attack from a disgruntled [single pool operator] who spent months in the Fake Fred discord actively looking at ways to harm the brand and reputation of IOG. He targeted my personal pool and it resulted in disruption of the entire cardano network."

Hoskinson's decision to involve the FBI horrified some onlookers, including one other engineer at the company who publicly quit after the incident. They wrote, "I've fucked up pen testing in a major way once. I've seen my colleagues do the same. I didn't realize there was a risk of getting raided by the authorities because of that + saying mean things on the Internet."

Read the whole story
cjheinz
4 days ago
reply
You can’t make this shit up.
Lexington, KY; Naples, FL
Share this story
Delete
Next Page of Stories