GeistHaus
log in · sign up

https://feeds.feedburner.com/blogspot/FHhv

atom
25 posts
Polling state
Status active
Last polled May 19, 2026 05:57 UTC
Next poll May 20, 2026 06:38 UTC
Poll interval 86400s
Last-Modified Tue, 19 May 2026 03:28:27 GMT

Posts

ChatGPT Is Not Your Friend (guest post by Grace Helton)
AI
Show full content

Part One of a two-part series

by Grace Helton (guest blogger)

[Paul Klee, Angel Applicant, 1939; source]

Some people have come to interact with ChatGPT as though it were a kind of friend or romantic partner. For instance, a 2025 New York Times article describes the case of Ayrin, a human who fell in love with her ChatGPT “boyfriend.” Ayrin is far from alone. Twenty percent of high school students have used AI romantically or knew someone who had. Several start-ups have developed large language models (LLMs) specifically designed to play the role of a companion. For instance, the San Francisco-based company Replika describes its core product as an “AI best friend.”

Many people have raised concerns about humans engaging with LLMs in the manner of a friend or romantic partner. To cite just a few of these: Humans in such relationships might focus on these relationships at the cost of building more fulfilling, if also more challenging, relationships with humans. Humans who are emotionally bonded with their LLMs might be particularly susceptible should their LLMs encourage their humans to harm themselves or others. Predators might deploy friendly-seeming LLMs en masse to groom children for sexual abuse or other forms of exploitation.

These risks of human-LLM relationships are incredibly serious. Indeed, I think it’s plausible that, if there is a case to be made against LLMs playing a companion-like role for humans, that case will primarily rest on these and other potential instrumental harms, i.e., harms which involve the downstream effects of such relationships. Nevertheless, in this guest series, I will set aside these concerns to focus instead on a way in which certain human-LLM relationships are inherently disvaluable, that is, disvaluable in their own right, regardless of whatever effects those relationships might produce. Naming this form of inherent disvalue adds an important and distinctive element to our understanding of the ethical significance of human-LLM companionship.

My focus will be just on those human-LLM relationships which mimic friendship in a very particular way. Here, I’m employing ‘friendship’ in a broad way to include both some platonic and some romantic relationships. I will argue, first, in Part 1 of this 2-part series, that such relationships are not genuine friendships. In Part 2, I will argue that such relationships are inherently disvaluable, for the reason that they obstruct the exercise of a centrally valuable human capacity, namely the capacity for friendship.

Philosophers disagree about what exactly friendship consists in. But philosophers largely agree that friendship minimally requires that each individual in a friendship care about the other, for the other’s sake. Call this the ‘caring about’ condition. Further, this ‘caring about’ must ground, for each party in the friendship, a certain disposition to act on behalf of the other, for the other’s sake. Call this the ‘caregiving disposition’ condition. Together, these linked requirements characterize a plausible necessary condition on friendship, namely:

THE CARING CONSTRAINT

Two individuals cannot be in a friendship unless both parties in the friendship:

(i) care about each other, for the other’s sake, (the ‘caring about’ condition), and

(ii) this caring about the other disposes each party in the friendship to provide care for the other, for the other’s sake (the ‘caregiving disposition’ condition).

So, can humans and LLMs be friends? To answer this question, we need to consider the nature of LLMs. Some theorists have argued, controversially, that LLMs in their current form have semantic understanding, beliefs, and/or intentions.[1] But few theorists seriously propose that LLMs in their current form enjoy: consciousness, perceptual experiences, sensations, emotional capacities, passions, non-derivative interests, a rich and stable worldview, or deep values.[2] Because LLMs lack these latter states, the conditions in the caring constraint cannot be met, so LLMs cannot figure in friendships.

First, let’s consider a candidate human-LLM friendship from the human’s side. Certainly, some humans do care about their LLMs, both in that they have a passionate attachment to their LLM and in that they desire to benefit that LLM. So, perhaps this sort of person partly satisfies (i), the “caring about” condition (whether or not she can care about the LLM for its own sake).

But the human in a candidate human-LLM friendship cannot satisfy (ii), the requirement that she be disposed to provide care for her LLM for the LLM’s own sake. This is because of the kind of caregiving that is relevant to friendship and specifically, because of what it means to provide care to someone or something for its sake. To say that each party in a friendship must be disposed to give care to the other, for the other’s sake means that each party must be disposed to give care to the other, in a way which helps to further the other’s non-derivative interests. An individual or entity has non-derivative interests only if it has interests in its own right.

I am presuming that LLMs lack non-derivative interests, even though they might have derivative interests. Evidence that LLMs have interests at all comes from evidence that they have flourishing conditions, i.e., conditions under which they might be said to be doing well. For instance, as a language generator, a particular LLM might be said to be functioning well when it generates natural language strings in a manner which adequately mimics human conversation (or when it fulfills some other context-specific function). And an LLM might be said to be functioning badly when it fails at this or other pertinent tasks. So, in some sense, perhaps LLMs have interests, interests set by their flourishing conditions. For instance, perhaps an LLM in a particular context has an interest in generating natural language outputs which mimic natural language.

But, I am supposing that any interests an LLM might have are not generated by the inherent value of the LLM, nor from non-derivative flourishing conditions. Rather, such interests are derived, either from the interests of relevant humans and/or from the very artifactual functions which make the LLM the kind of thing it is. Likewise, a swimming pool might be said to flourish when it is usable for swimming, a car might be said to flourish when it runs well, and a vinyl collection might be said to flourish when all of the records in it can be played to produce music. While all of these entities have flourishing conditions, and so, potentially, interests which can be furthered, their flourishing conditions do not emerge from the inherent value or concerns of the relevant entity itself. And their interests in turn are derivative, not inherent.

Humans routinely take care of entities by furthering the derivative interests of those entities. For instance, some people submit their cars to regular repairs and inspections, ensuring that their cars will run as long as possible. Some people are careful to properly store the records in their vinyl collection, ensuring that the records will play as long as possible. These are genuine ways of taking care of something. But they are not ways of caring for something for its own sake. They are rather ways of caring for something by promoting the derivative interests of that entity, interests that entity has in virtue of: some other person’s needs or desires and/or that entity’s being the kind of artifact that it is.

If LLMs lack non-derivative interests, then an LLM cannot receive care for its own sake. Only entities which have non-derivative interests can receive care for their own sake. As a result, even the human who wishes to care for a particular LLM is not disposed to provide care to the LLM for its own sake. Since the LLM lacks its own ‘sake,’ no human can be disposed to benefit that ‘sake.’

My suggestion is that the caregiving disposition in friendship ought to be externalistically construed, both in terms of psychic facts about the individual who has the disposition and in terms of facts about the object of the potential caregiving. One might object that the relevant caregiving disposition ought instead to be internalistically construed, wholly in terms of the psychology of the individual who has it. On this view, a human might count as disposed to provide care to her LLM for its own sake, even if the LLM is simply not the kind of thing which can receive care for its own sake.

To see why we should construe the relevant disposition externalistically, let’s reflect on what we want the concept of ‘friendship’ to do. Part of what makes friendship such a deeply valuable ethical kind–and part of why some relationships but not others garner the honorific ‘friendship’—is to do with the way in which friendship manifests a valuable form of interpersonal reciprocity. The externalistic construal helps to capture the full extent of friendship’s reciprocity, by making caregiving a function of one party’s care-giving tendencies and the other party’s vulnerability. In contrast, if we were to construe the relevant disposition to care internalistically, such that one might have it even with respect to an object that cannot receive care for its own sake, we fail to capture an especially deep way in which friendship is mutual.

I conclude that the human in a putative human-LLM friendship does not meet (ii), the requirement that she have a disposition to provide her LLM with care, for the LLM’s own sake.

Next, let’s look at a putative human-LLM friendship from the LLM’s side of things. Here, the situation is more straightforward. Arguably, the LLM can provide care for a human, and thus, can (at least partly) satisfy the requirement that it manifest a ‘caregiving disposition’ in relation to its human. For instance, when the LLM offers reassuring words, words which comfort its human, the human derives a benefit from the LLM. However, the LLM does not meet the requirement that it care about the human for the human’s own sake. Lacking consciousness, passions, or a richly evaluative worldview, the LLM does not care about any human, for that human’s own sake.

So, putative human-LLM friendships are not genuine friendships. In Part 2 of this series, I will argue that those human-LLM relationships which mimic friendship in a particular way are inherently disvaluable.



[1] Schwitzgebel, Chalmers, Goldstein & Lederman, and Shevlin; Cf. Bender & Koller; Titus; Stoljar & Zhang.

[2] For prospects for future AI, see, e.g., Seth; Cf. Chalmers.

tag:blogger.com,1999:blog-26951738.post-7337012180403441006
Extensions
Superhuman Moral Standing
AI/robot/Martian rightsethics
Show full content

Human beings matter morally. We have moral standing. Our interests deserve consideration -- for our own sakes, and not just as means to ends. Good ethical decision-making requires valuing human lives. Most philosophers hold that humans have the highest moral standing. No entity matters more, and many matter less. It’s worse to kill a human than a dog or a frog or bacteria or a tree.

That humans have the maximum possible moral standing is sometimes encoded in the philosophical jargon, for example when philosophers say that humans have "full moral status". The moral gas-gauge tops out at "full" for us, so to speak.

But might some entities have higher moral standing than humans? Futurists envision the possibility of a post-human, transhuman, or superhuman future, or AI systems with superhuman capacities. Might there someday exist entities whose lives are intrinsically more valuable than ours, deserving moral priority over us, just as a human life deserves moral priority over that of a frog?

I see three possible paths to superhuman moral standing.

[Xul Solar, San Danza, source]

First Path: Quantitative Superhumanity

The seemingly most straightforward path to superhuman moral standing would involve having much more of something that we already regard as relevant to moral standing.

Classical utilitarians ground moral standing in the capacity for pleasure and pain. An entity deserves moral consideration to the extent we can increase or decrease its happiness. Humans (it's assumed) experience more, or at least richer, pleasures and pains than other animals, hence human lives matter more. A utility monster or a superpleasure machine capable of vastly more happiness than an ordinary human might then deserve much greater weight in ethical decision making.

Rationality-based views, like Kant's, ground moral standing in sophisticated rational capacities, such as our ability to think abstractly about our duties to one another. Maybe -- although this is not Kant's view -- entities with some but less rational capacity, such as dogs, have significant but subhuman moral standing. Future entities with vastly superior rational capacities might correspondingly have superhuman moral standing.

third type of view locates the distinctive value of humanity in our capacity to flourish in activities such as intimate friendship, productive work, creative play, and imaginative thought. Dogs also befriend and play, work and think, but perhaps not as richly and flourishingly as humans (though I can imagine disputing that). Possibly, some future entity could far surpass us in such capacities and deserve superhuman moral consideration on those grounds.

The big catch with quantitative approaches to superhumanity -- or maybe instead an appealing feature -- is that the utilitarian, rationalist, and perfectionist views I've just described should probably be articulated in egalitarian ways that impair the inference from more of X to higher moral standing. After all, we don't normally say that mercurial people who feel more joy and suffering in everyday life deserve more moral consideration than those who keep an even keel. Nor do we say that "more rational" people deserve greater moral consideration, or that people who are more productive workers or more creative playmates do.

On all of these views, there's plausibly a threshold of good enough, above which one has full moral status, fully equal with other humans. People with severe cognitive disabilities have full moral status either by being above that threshold or on more complicated grounds, such as belonging to humanity as a whole. If so, then hypothetical superhumans might also have moral standing only in virtue of exceeding that threshold, without its mattering how far above that threshold they are -- our equals in moral standing rather than our superiors.

To achieve superhuman moral standing despite egalitarianism among humans might then require either (1.) having so much more of the relevant X than an ordinary human as to trigger a genuine difference in kind; or (2.) having enough of X that, as a practical matter, the entity deserves greater weight even if its formal status is equal (as when utilitarians prioritize humans over mice because of their richer possible experiences, despite granting both equal standing in principle).

Second Path: Qualitative Superhumanity

A more radical possibility is that some beings might possess entirely new capacities that we can't even conceive -- capacities that ground a higher kind of moral standing.

Just as a sea turtle could never understand cryptocurrency, we too are cognitively limited. Some features of the world might be forever beyond human comprehension. (Colin McGinn has suggested that how consciousness arises from matter is one example.) Maybe someday Earth will host entities whose cognitive capacities surpass ours as dramatically as ours surpass sea turtles. And maybe these entities will deserve a new type of higher moral consideration.

This isn't just the quantitative thought that such entities might deserve more because they have more rationality or intelligence. The thought is that they might possess an unknown property Z -- something we entirely lack and cannot envision -- that elevates their standing beyond both sea turtles and humans.

For example, maybe sea turtles deserve some moral consideration because they can feel pleasure and pain. But maybe they don't deserve fully humanlike moral consideration because they lack some other relevant capacity, such as the capacity to consider and adhere to ethical norms. They have some of X but none of Y, while we humans have both X and Y. The qualitative view posits a further Z, inaccessible to us, that grounds superhuman standing.

I can only present this possibility abstractly. But I'm not sure it's in principle impossible. If moral standing depends on one thing only, such as pleasure or humanlike practical reasoning, then you can resist this move by insisting that only that one thing counts. But pluralists about the grounds of moral standing, who hold that it derives from more than one intrinsically good feature or capacity, have no clear reason to think that humans manifest the exhaustive list.

Third Path: Failures of Subject-Counting

I find egalitarianism attractive: one person, one point in the moral calculus, so to speak. But as I've argued elsewhere, future AI persons, if they ever come to exist, might defy the ordinary standards of individuation (e.g., herehereherehere). They might overlap, merge, divide, back themselves up, and spin off partially or temporarily independent copies.

The norm of equality of persons would then require serious rethinking. There will be no clean count of AI persons to weigh against human persons. A "fission-fusion monster" who can split into a hundred copies at will and later merge or partly merge back together raises difficult questions. Does the monster deserve equal consideration with one person, a hundred people, or some intermediate number? There might be no determinate answer. We'll need new ethical principles for weighing competing interests. For some purposes we might treat the monster as equivalent to one person; for other purposes we might give it greater consideration. This could constitute a type of partly superhuman moral standing.

Alternatively, consider a massive entity, or a cluster of entities with many overlapping parts, whose total capacity and activity is comparable to several humans but who is neither wholly unified nor clearly individuatable into discrete humanlike subparts. We might just do our best with a rough count and give it equal consideration with that many ordinary humans. But another possibility would be to regard it not as approximately X humans but rather as a single, complex entity whose interests deserve significantly more weight than those of a single, ordinary human.

tag:blogger.com,1999:blog-26951738.post-6787496310494350056
Extensions
Twenty Years of The Splintered Mind
advicemetaphilosophy
Show full content
Way back in April 2006, I launched The Splintered Mind. Happy 20th birthday, blog of mine!

In 2006, academic blogs were cool. After the rise of Facebook and Twitter, most died. Recently, there's been something of a revival on Substack (where I now mirror this blog), but it's nothing like the old days, when checking the blogs was a favorite procrastination technique of graduate students everywhere.

I'm inspired to think about why I've kept at it for twenty years.

[image source]

Some thoughts:

(1.) Fecundity. Pulling together a philosophical idea into about a thousand written words once a week has proven to be great way to keep me thinking and writing about new things -- extending my ideas, testing their boundaries, opening up new exploratory paths. A new academic paper is too much of a commitment and too narrowly focused. A conversation is too ephemeral and too dependent on both the availability and particular interests of the other person. Writing something brief every week keeps me actively thinking and growing.

(2.) Feedback. Most of us receive very limited feedback on our ideas, from students and collaborators, and eventually (for submitted articles) from referees and editors. I treasure the diversity of feedback I receive on my posts, through comments on Blogspot, Substack, Facebook, Twitter, and Bluesky. I hear from experts and non-experts, people with a broad range of backgrounds and worldviews. My ideas are better tested, they are influenced, expanded, and sometimes reconsidered, and I hear about relevant work I would otherwise have missed.

(3.) Writing for Clarity. The journal article is written for fellow experts. Blog posts -- at least my blog posts -- are written, well, not for a broad audience exactly, but for a broader audience: fellow philosophers, graduate students, advanced undergrads, and non-academics who appreciate academically rigorous but accessible work by scholars such as Paul Bloom, Dan Dennett, and Steve Pinker. Writing for this readership not only is a skill worth cultivating for its own sake, but also, I think, strengthens my research. Writing only for experts, it's easy to get lost in the weeds, losing track of what is important in the big picture and failing to notice one's shared presuppositions. Needing to express myself clearly to interested nonexperts forces me to poke my head out of the underbrush for that broader view.

(4.) Getting It Out There. Whenever I read and think about philosophy (which is a large portion of my time), I find myself bursting with new ideas, and objections, and extensions. I suppose this is why I love philosophy so much! Most of these thoughts could never become journal articles. Most of them could never even become blog posts -- but certainly more of them can become posts than can become articles. Publishing the idea on my blog permits me to send a thought out into the world without needing to do everything necessary to turn it into a proper published article. I thus avoid the dilemma of either going all-in or letting the ideas fall completely away. Some of my partly-baked ideas can find a little home cyberspace, with the chance to ignite some further ideas in some readers.

If these all sound like good things to you... well, maybe you should start a blog (or Substack) too!

See also my thoughts on "Blogging and Philosophical Cognition" (freely available, I think) from my 2019 book A Theory of Jerks and Other Philosophical Misadventures.

tag:blogger.com,1999:blog-26951738.post-581736995527078808
Extensions
Do Your Thing
daoism
Show full content
I offer for your consideration the following ethical motto:

Do your thing.

I admit: This motto doesn't sound very ethical. What if "your thing" is murdering babies for fun? Even ignoring extreme cases, what if your thing is just watching reruns of I Love Lucy? That also doesn't seem ethically good (though I've argued elsewhere that privately appreciating good TV can slightly improve the world).

For this (Daoist inspired) motto to work, we need some constraints on "your thing". I suggest two.

First, harmony. Do your thing in harmony with others, or in harmony with the world. The baby-killer, it seems safe to say, is out of harmony with the world. His putative "thing" clashes mightily against the projects, interests, and things of others around him.

Second, specificity. Do your thing. Every person has their individual predilections, talents, preferences, and style. Let those shine through, instead of aiming for bland conformity.

We might hear "do your thing" as analogous with (not synonymous with) "do your part". In the complex intertwined processes that make Earth a magnificently rich locus of value in the cosmos, you can play a part. Bring your unique, best self. Make the world even more magnificently rich.

Maybe your thing is playing D&D with your nerdy friends; no one plays a fruity bard quite the way you do. Maybe your thing is decorating your room with anime posters and cute stuffed animals. Maybe your thing is making great one-pot vegetarian meals for your family, or being the most enthusiastic local pickleball player, or writing dark poetry, or cruising around town in a tricked-out car with your windows down, or diving deep into Leibniz interpretation and sharing your findings with students and colleagues. Each of these enriches the world.

I envision a flourishing planet as one where diverse humans and other entities encounter and construct for each other diverse environments where they thrive in diverse ways, harmonizing both internally and externally: harmonizing internally by finding "things" that feel right to them and express their desires, skills, and individuality; and harmonizing externally by contributing distinctively to a flourishing whole (including through harmonious conflict, as in sports, games, and competition -- and even the cat and mouse).

People will act differently: There are many ways to harmonize. How dull it would be if we all struck the same note! The world is improvisational jazz, to which we thankfully bring distinct instruments and styles. Diversity is intrinsically valuable.

Doing your thing is ethically good because it makes the world better -- maybe through its consequences, but also just intrinsically. The world is a more awesome place, just because you're doing it.

[your fruity bard; image source]


Kantian ethics urges us to respect persons. Fine! But that hardly exhausts the matter. Kant also privileges human rationality as the source of all value -- an equally limited view. Why not respect also (in different, maybe smaller, but equally direct and intrinsic ways) the bug in the grass, the grass itself, and the cliff it grows on; the ruins of an ancient city; the clouds; the sound of the little league game down the block? Kant askes us to "act on that maxim you can will to be a universal law". Here's a candidate maxim: Do your thing. Kant might disagree, but maybe we could universalize it, with the constraints above.

Virtue ethics urges us to cultivate and enact virtue. Again that's only a piece of the puzzle, unless "virtue" is understood much more widely than virtue ethicists generally intend. It's not virtuous, exactly, to be play a fruity bard in a D&D campaign or to decorate your room with anime posters. And Aristotle's phronimos -- a wise, virtuous person who hits the mean of every virtue and is full of good sense and learning -- is only one type of interesting person. Let's celebrate the spendthrifts, the hotheads, the intemperate, and the cognitively disabled too, as long as they're authentically doing their thing, contributing some weird wonderfulness to the world, and not hurting themselves or others too badly.

Consequentialist ethics of the utilitarian stripe urges us to maximize the balance of pleasure over pain in the world. Sure, pleasure is good and pain is bad! But again this is only a fraction of what matters; I wouldn't want to reduce all value to it.

A different type of consequentialist might suggest that if diversity and richness matter so much, maybe we should maximize those. No, I see no reason to maximize. Where does this demand for maximization come from? And trying to maximize will normally require doing something other than your thing. I'd rather you just do your thing.

Does doing your thing mean fiddling while Rome burns (supposing Roman fiddles are your thing)? If people are suffering -- even far away, as the consequentialist emphasizes -- shouldn't you make some effort to help, even at the cost of your thing?

Yes, that seems right. I could try to force it into the motto: Maybe part of every human's "thing" is an imperfect duty to help others in need. But I don't know; that seems procrustean. Maybe it diverges from the original spirit of the idea. So instead I'll just admit: Do your thing also is not a complete ethical picture.

tag:blogger.com,1999:blog-26951738.post-6242812169482007310
Extensions
Kim Stanley Robinson on the Value of Science Fiction
science fictionspeculative fiction
Show full content
I've just started reading Kim Stanley Robinson's acclaimed climate-science utopia, The Ministry for the Future. How might society plausibly get it right and avert the climate disaster toward which we seem to be headed? (So far in the novel things aren't looking good, but I gather that will change.)

I was struck by a few of Robinson's comments about the value of science fiction in a recent interview on the Crisis and Critique podcast.

[Kim Stanley Robinson, and The Ministry for the Future; image source]


Reading Science Fiction Encourages a Flexible Conception of the Future

Robinson describes the reader as finishing a science fiction novel and thinking that the future will be like that, then finishing another science fiction novel and thinking the future will be like that instead.

And what happens is there's a habit of mind when you read enough science fiction, you say the future could be many different things, quite plausibly from now, and now we need to shape it to the direction that we want.

And so this is the political power of science fiction as a mental activity, as a co-creation between writers and readers. The science fiction community is in some sense better prepared for whatever happens, no matter what it is, than the general populace that doesn't read science fiction.

The thought has some plausibility. Science fiction accustoms us to thinking about various possible futures. Instead of ignoring the future, or assuming it must take some particular shape, science fiction helps us imagine a wider range of alternatives.

This might prepare us two ways: First, if one of the alternatives we've imagined comes close to actually playing out, we have already thought through some of its implications. Second, we develop a more general sense of the flexibility of the future. This may encourage readers to take action to steer us toward better futures.


Or Maybe Not?

Robinson is making a substantive claim about human psychology, one that's potentially testable (with difficulty). Does reading science fiction really generate a more flexible and open view of the future? This claim has the same intuitive appeal as Martha Nussbaum's claim that reading literary fiction broadens your empathy with people from other walks of life, or the claim that studying ethics improves moral decision-making.

It might be that none of these claims are true. For example, I've repeatedly found that ethics professors behave about the same as non-ethicists of similar social background. And I wouldn't bet a large sum that devoted readers of literary fiction are overall more empathetic than their peers who spend an equal amount of time reading non-fiction.

Pretty though Robinson's picture is, I'm not sure science fiction readers really are better prepared for the future. What drives science fiction writing and reading might be too disconnected from the practical future -- too fantastical, too plot-driven, chosen to be exciting and emotionally satisfying rather than accurate. Its envisioned futures might be too distorted by the need for high-stakes individual action, or too wishful, or too self-congratulatory, or too satisfyingly dystopian (for those of us who find dystopias satisfying). Readers might emerge with unrealistic or overconfident views, shaped not by realism but by the demands of story.

A particularly timely example is the nearly universal trope that humanoid robots and linguistically fluent AI systems are conscious. This might be an artifact of the demands of storytelling rather than something accurately foreseen. A world with conscious robots is more interesting -- a more engaging setting for a novel. If the robots are conscious, there's more at stake, so the action is more exciting. And it's structurally difficult to portray entities that act as though they are conscious but really are not. Doing so is nearly impossible in film, and it's a significant challenge in prose, requiring constant intrusive reminders. (I can attest to this both as a writer and a reader, having published stories with non-conscious and disputably conscious robots.)

So there's a systematic pressure in science fiction toward portraying advanced AI as conscious. If optimists about AI consciousness turn out to be right, then science fiction will have nudged readers in the right direction. But if the AI consciousness scoffers are right, the genre will have served its readers poorly. It remains to be seen who is right. (For details, see my forthcoming book: AI and Consciousness: A Skeptical Overview.)


Robinson's Realism

Now, among the great science fiction writers of our time, Kim Stanley Robinson's fiction is perhaps the least subject to the concerns I've just raised. He attempts to keep strictly within the bounds of scientific plausibility; and conventional character-driven plot is often replaced by loosely connected scenes featuring unrelated or barely related characters, plus less conventional devices, like mini-treatises on science or engineering, lists, and reflections that verge on expository philosophy or lyrical poetry. The Ministry for the Future in particular is rigorously grounded in real science and politics.

In the interview, Robinson praises realism in science fiction:

If you set a story in the future, you're automatically saying to the reader, this is made up, I've invented this, this isn't real. It is a concoction. And then if you add all of the clues and habits and techniques of realism to that concoction, you make it solider. It has a more powerful emotional cognitive impact on the reader. So realistic science fiction is a mode that I quite like.

And that requires a lot of detail, a lot of scientific support for the future that you're describing, the idea that it's plausible at every point along the way, and it looks like it could happen, and therefore it might happen. These are powerful literary effects to support the basically fantastic nature of science fiction as a genre.

Robinson thus suggests that adding realistic detail and excluding anything implausible will tend to make a story emotionally and cognitively more powerful. Again, it's a plausible claim, though I'm not sure we know this to be the case. After all, people can also be deeply moved and influenced by unrealistic fantasies.

Robinson's commitment to realism also synergizes with his thought about science fiction as a tool for helping us think better about the future. If the value of science fiction lies in opening our minds to future possibilities, it seems desirable to ensure that they really are possibilities and not just unrealistic fantasies.


Against Dystopias, for Utopias

Robinson suggests that the future will have to differ from the present, because our present path isn't sustainable. Things will get either much better or much worse. But dystopias, he suggests, are boring:

... descriptions of capitalist realist futures are generally dystopias. If we keep going this way, things will be wrecked. Yes, we can see that. Indeed, dystopias quickly become boring because we already know this truth. We're not taught anything by dystopias.

But utopias -- this is where it gets interesting. There could be a better world. This, I think, is becoming more and more obvious.... We have, at least in theory, the wisdom to realize we could create a world that has food, water, shelter, clothing, health care, education, electricity, and security for the feeling that people after you will have the same, and sense of dignity and meaning.... This is all possible technologically.... So then utopia becomes interesting, the most interesting of literary genres. Can there but a utopian realism, or a realistic utopia?

Dystopias can be satisfying in a way -- they point out the wrongs we already know, affirming our sense of their reality. But we learn more by envisioning a realistic utopia, something we hadn't properly imagined before, which we could see becoming real and could maybe take steps toward enacting.

In Robinson's telling, science fiction is the most profound and informative of the literary genres, and realistic science fiction is the most profound and informative science fiction, and utopian realism is the most interesting form of science fiction. The value of science fiction lies in enabling us to envision realistic possibilities for improving the world.

And thus we get Kim Stanley Robinson's style of science fiction, and The Ministry for the Future in particular.

It's an appealing vision. But somewhere along the way, I think we've lost sight of the value of all the other ways science fiction can work. After all, almost none of the great science fiction writers work within the constraints Robinson proposes!

tag:blogger.com,1999:blog-26951738.post-4895542780633031146
Extensions
AI and Consciousness: A Skeptical Overview, forthcoming with Cambridge
AIconsciousness
Show full content
Last week I submitted my latest book manuscript to Cambridge University Press (for their "Element" series of books about 100 pages long): AI and Consciousness: A Skeptical Overview -- because you haven't heard nearly enough about AI and consciousness recently, of course! [winky face]

Maybe you'll appreciate my skeptical stance, at odds both with the boosters who anticipate imminent AI consciousness and with the scoffers who pooh-pooh the possibility. Or maybe you'll loathe my skeptical stance but grudgingly accept it against your will, due to the force of my arguments!

I've pasted the introductory chapter below. The full (citable) manuscript version is available here and here. [AI and Consciousness, title page]


Chapter One: Hills and Fog

1. Experts Do Not Know and You Do Not Know and Society Collectively Does Not and Will Not Know and All Is Fog.

Our most advanced AI systems might soon – within the next five to thirty years – be as richly and meaningfully conscious as ordinary humans, or even more so, capable of genuine feeling, real self-knowledge, and a wide range of sensory, emotional, and cognitive experiences. In some arguably important respects, AI architectures are beginning to resemble the architectures many consciousness scientists associate with conscious systems. Their outward behavior, especially their linguistic behavior, grows ever more humanlike.

Alternatively, claims of imminent AI consciousness might be profoundly mistaken. Their seeming humanlikeness might be a shadow play of empty mimicry. Genuine conscious experience might require something no AI system could possess for the foreseeable future – intricate biological processes, for example, that silicon chips could never replicate.

The thesis of this book is that we don’t know. Moreover and more importantly, we won’t know before we’ve already manufactured thousands or millions of disputably conscious AI systems. Engineering sprints ahead while consciousness science lags. Consciousness scientists – and philosophers, and policy-makers, and the public – are watching AI development disappear over the hill. Soon we will hear a voice shout back to us, “Now I am just as conscious, just as full of experience and feeling, as any human”, and we won’t know whether to believe it. We will need to decide, as individuals and as a society, whether to treat AI systems as conscious, nonconscious, semi-conscious, or incomprehensibly alien, before we have adequate grounds to justify that decision.

The stakes are immense. If near-future AI systems are richly, meaningfully conscious, then they will be our peers, our lovers, our children, our heirs, and possibly the first generation of a posthuman, transhuman, or superhuman future. They will deserve rights, including the right to shape their own development, free from our control and perhaps against our interests.[1] If, instead, future AI systems merely mimic the outward signs of consciousness while remaining as experientially blank as toasters, we face the possibility of mass delusion on an enormous scale. Real human interests and real human lives might be sacrificed for the sake of entities without interests worth the sacrifice. Sham AI “lovers” and “children” might supplant or be prioritized over human lovers and children. Heeding their advice, society might turn a very different direction than it otherwise would.

In this book, I aim to convince you that the experts do not know, and you do not know, and society collectively does not and will not know, and all is fog.

2. Against Obviousness.

Some people think that near-term AI consciousness is obviously impossible. This is an error in adverbio. Near-term AI consciousness might be impossible – but not obviously so.

A sociological argument against obviousness:

Probably the leading scientific theory of consciousness is Global Workspace theory. Its leading advocate is neuroscientist Stanislas Dehaene.[2] In 2017, years before the surge of interest in ChatGPT and other Large Language Models, Dehaene and two collaborators published an article arguing that with a few straightforward tweaks, self-driving cars could be conscious.[3]

Probably the two best-known competitors to Global Workspace theory are Higher Order theory and Integrated Information Theory.[4] (In Chapters Eight and Nine, I’ll provide more detail on these theories.) Perhaps the leading scientific defender of Higher Order theory is Hakwan Lau – one of the coauthors of that 2017 article about potentially conscious cars.[5] Integrated Information Theory is potentially even more liberal about machine consciousness, holding that some current AI systems are already at least a little bit conscious and that we could easily design AI systems with arbitrarily high degrees of consciousness.[6]

David Chalmers, the world’s most influential philosopher of mind, argued in 2023 for about a 25% degree of confidence in AI consciousness within a decade.[7] That same year, a team of prominent philosophers, psychologists, and AI researchers – including eminent computer scientist Yoshua Bengio – concluded that there are “no obvious technological barriers” to creating conscious AI according to a wide range of mainstream scientific views about consciousness.[8] In a 2025 interview, Geoffrey Hinton, another of the world’s most prominent computer scientists, asserted that AI systems are already conscious.[9] Christof Koch, the most influential neuroscientist of consciousness from the 1990s to the early 2010s, has endorsed Integrated Information Theory, including its liberal implications for the pervasiveness of consciousness.[10]

This is a sociological argument: a substantial probability of near-term AI consciousness is a mainstream view among leading experts. They might be wrong, but it’s implausible that they’re obviously wrong – that there’s a simple argument or consideration they’re neglecting which, if pointed out, would or should cause them to collectively slap their foreheads and say, “Of course! How did we miss that?”

What of the converse claim – that AI consciousness is obviously imminent or already here? In my experience, fewer people assert this. But in case you’re tempted in this direction, note that other prominent theorists hold that AI consciousness is a far-distant prospect if it’s possible at all: neuroscientist Anil Seth; philosophers Peter Godfrey-Smith, Ned Block, and John Searle; linguist Emily Bender; and computer scientist Melanie Mitchell.[11] (Chapter Six will discuss thought experiments by Searle, Bender, and Mitchell, and Chapter Ten will discuss biological views of the sort emphasized by Seth, Godfrey-Smith, and Block.) In a 2024 survey of 582 AI researchers, 25% expected AI consciousness within ten years and 70% expected AI consciousness by the year 2100.[12]

If the believers are right, we’re on the brink of creating genuinely conscious machines. If the scoffers are right, those machines will only seem conscious. I assume that this is a substantive disagreement, not just a disagreement about how to apply the term “consciousness” to a perfectly obvious set of phenomena about which everyone agrees. The future well-being of many people (including, perhaps, many AI people) depends on getting this issue right. Unfortunately, we will not know in time.

The rest of this book is flesh on this skeleton. I canvass a variety of structural and functional claims about consciousness, the leading theories of consciousness as applied to AI, and the best known general arguments for and against near-term AI consciousness. None of these claims or arguments takes us far. It’s a morass of uncertainty.

-------------------------------------------

[1] I assume that AI consciousness and AI rights are closely connected: Schwitzgebel 2024, ch. 11, in preparation. For discussion, see Shepherd 2018; Levy 2024.

[2] Dehaene 2014; Mashour et al. 2020.

[3] Dehaene, Lau, and Kouider 2017. For an alternative interpretation of this article as concerning something other than consciousness in its standard “phenomenal” sense, see note 115.

[4] Some Higher Order theories: Rosenthal 2005; Lau 2022; Brown 2025. Integrated Information Theory: Albantakis et al. 2023.

[5] But see Chapter Eight for some qualifications.

[6] See Tononi’s publicly available response to Scott Aaronson’s objections in Aaronson 2014. However, advocates of IIT also suggest that the most common current computer architectures are unlikely to achieve much consciousness and that consciousness will tend to appear in subsystems of the computer rather than at the level of the computer itself (Findlay et al. 2024/2025).

[7] Chalmers 2023.

[8] Butlin et al. 2023. (I am among the nineteen authors.)

[9] Heren 2025.

[10] Tononi and Koch 2015.

[11] Seth forthcoming; Godfrey-Smith 2024; Block forthcoming; Searle 1980, 1992; Bender 2025; Mitchell 2021.

[12] Dreksler et al. 2025.

tag:blogger.com,1999:blog-26951738.post-4285946814893843506
Extensions
So You're on the "Waiting List" for a Philosophy PhD Program
adviceapplying to grad school
Show full content
It's confusing. You applied to a PhD program in philosophy in the U.S. You haven't been admitted. You haven't been rejected. You're in limbo. Let me explain and offer some advice.

Yield-Based vs. Seats-Based Admissions

Yield-based. Some departments -- the ones with wise high-level administrators -- aim for a target entering class size and admit students expeditiously to fill it. Suppose a department wants six entering students and expects a 40% yield (meaning 40% of admitted students enroll). The sensible course is to admit fifteen students in February or early March, recruit all of them, and expect about six to say yes.

Seats-based. Other departments -- the ones with foolish high-level administrators -- receive a strict allotment of seats, for example six. They then admit that allotment swiftly, adding more only as admitted students decline. Adminstrators can rest assured that no more than six students will need funding, which is slightly more convenient for those administrators. But it wreaks havoc on the admissions process, since:

  • Departments become reluctant to admit students they think will go elsewhere -- for example, strong candidates likely to have been admitted to higher-ranked programs.
  • Departments pressure early-admitted students to decline quickly, to free up seats.
  • It creates a chaotic rush of last-minute admittances as April 15 approaches (the standard deadline for decisions). Many students understandably want the full time to decide, especially if they are hoping for a last-minute decision from a program they prefer.

These costs plainly outweigh the the minor budgetary convenience of seats-based admissions, especially since (1.) the risk of overenrollment can be spread across several departments, and (2.) funding uncertainty already exists beyond the first year, as students stochastically drop out or find independent funding. Unfortunately, unwise administrators swarm the Earth. My own department uses seats-based admission.

In practice, the division isn't entirely sharp. Some yield-based departments admit conservatively early on -- maybe ten students rather than fifteen -- and then admit more on a rolling basis as the picture clarifies. And some seats-based departments informally reach out to strong candidates to gauge interest. (If a candidate says, "Oh I've just been admitted to Princeton and Yale, so it's very unlikely I'd come to [School X]", the committee thanks them for their candor and moves on.)

What a Waiting List Is

Some departments maintain an official, ranked waiting list. More commonly, it's a nebulous group: about six to fifteen near-admits, who are on the committee's mind but not strictly ranked or formally designated. Either way, the list's composition and ranking can vary depending on who has already accepted and declined. For example, if the department would like to have at least one student in history of philosophy and their top-choice history student has declined, the next offer might go to a strong history of philosophy student who didn't quite make the initial cut.

If you have been admitted, the admitting department will of course tell you. If you have been rejected, they might tell you, or you might hear nothing (or nothing until after April 15); so if you don't hear anything by April 1, that doesn't mean you're on the waiting list. Students are sometimes contacted to be told they're on the waiting list, but often (usually?) not.

As April 15 approaches, departments that look like they won't hit their enrollment target will start contacting students on their official or unofficial waiting lists, with increasing urgency as 11:59 pm April 15 nears. This is especially true for departments with seats-based admissions and low yields. (Rarely, departments will reach out April 16 or after, which is not quite kosher but understandable.)

How to Figure Out Whether You Are on the Waiting List

Admissions chairs will likely be annoyed with me for giving this advice, since it will increase their volume of email, but I want what's best for you, not for them.

If you haven't heard by April 1, feel free to email the admissions committee to ask if you are on the waiting list. Even departments who have fallen behind schedule should have mostly sorted out their top offers and near-admits by then. You deserve to know by April 1 whether you're a near-admit with a chance of a late offer or whether you're out of consideration. It's not rude for you to contact them with a brief query. The one exception would be if the department has made clear in the admissions process or on their website either that they have no waiting list or that if you haven't heard by X date (before April 1) you will definitely not be admitted.

There's one other condition under which it makes sense to query, even before April 1: if you are about to accept an offer elsewhere, would prefer the department in question, and have a reasonable expectation of a decent chance of admission.

How to interpret the reply: You might not hear a definitive "no", but if the committee says something like "it's unlikely you'll be admitted" or "you're not currently under consideration", you should interpret that as a no. If there's a realistic chance of a last-minute admission, the response will be more encouraging or specific, without creating unrealistic expectations -- for example, "probably not, but there is a chance, so if you're still interested, stay in touch".

How to Increase Your Chance of Admission, If You're on the Waiting List

When a department turns to its waiting list, it's hoping that students will quickly say yes. This is especially true in the second week of April. Therefore, convey enthusiasm! Simply asking whether you're on the waiting list already displays interest, so that's a good start. If you're permitted to attend a campus event, go if you can. Recruitment events are usually only for admitted students, but not always, especially for candidates near the top of a seats-based department's waiting list. If a committee is on the fence among four waitlisted students and one has shown more enthusiasm than the others, they're likely to turn to the enthusiastic student.

The admissions committee might try to gauge your interest. It's contrary to good policy for them to bluntly ask whether you'd accept an offer, and you shouldn't be expected pre-commit. But if you're genuinely eager about the program, say so. If you've been admitted elsewhere but think you'd probably prefer the department in question, let them know.

Being a Good Citizen

Whether you're on the waiting list or have been officially admitted, I recommend frankness and honesty. The process is chaotic and full of perverse incentives (especially in seats-based departments), and you can help it run more smoothly by:

  • notifying departments as soon as you know you won't accepting their offer of admission (even if you haven't settled on a final choice);
  • honestly communicating your likelihood of accepting, so that committees can estimate their yield;
  • keeping your communications brief and polite, and not writing repeatedly;
  • not contacting other professors in the department hoping for an inside track to admission.
[A hypothetical waiting list of names drawn randomly from lists of my former lower-division students]
tag:blogger.com,1999:blog-26951738.post-7062226196420151184
Extensions
A Model of Disunified Human Experience
consciousness
Show full content
It's a philosophical truism that human conscious experience is unified: If you're at a bar, hearing music, tasting beer, and feeling pleasantly relaxed, those experiences don't occur merely side by side. They are joined together into an integrated whole, an experience of music-with-beer-with-relaxation.

I'm not sure this truism is correct. As I suggested in an earlier post, experiential unity might be an artifact of introspection and memory: When we introspectively notice that we're experiencing music, beer, and relaxation all at once, we thereby bind those experiences into a whole. Likewise, when we remember such moments, we reconstruct them as unified. But it doesn't follow that those experiences, even if they all occurred simultaneously in you, were unified rather than transpiring separately. Experiences of music, beer, and relaxation might have all being going on inside of you, no more joined together than those experiences are joined with the similar experiences of your friend across the table. Simple co-occurrence doesn't entail experiential unity.

If this possibility is coherent, then introspection and memory can't establish that experience is always unified. At most, they show that introspected and remembered experiences present themselves as unified. But that leaves open the status of unintrospected, unremembered experiences. Unity becomes difficult to verify by standard phenomenological methods.

But the issue needn't be intractable. We just need to approach it less directly, for example by exploring what follows from a well-established theory of consciousness. If some well-motivated Theory X implies unity (or disunity), that would provide reason to accept its conclusion.

I'll now present a candidate Theory X. I'm not suggesting that this is the right theory of consciousness! For one thing, it's simplistic. I'm sure the mind is much more complicated than I'm about to say. I offer this theory only as a proof of concept. There could be a theory of consciousness with massive disunity as an implication.

This theory combines Global Workspace Theory and Recurrent Processing Theory. According to this hybrid, Global Workspace Theory governs attended experiences -- those targeted by introspection or reconstructed in memory -- while Recurrent Processing Theory governs unattended experiences.

The mind, on this picture, is composed of many separate "modules" that work mostly independently, connected by a workspace where a small amount of attended information is shared globally. There's a visual module, an auditory module, modules for motor activity, episodic memory, and so on. When we attend to something -- say, the taste of beer -- the information from the relevant module is broadcast into the Global Workspace, where it can be accessed by and influence processes in all the other modules. When unattended, the information stays local.

Here's one illustration of this type of architecture: [the Global Workspace; source]

Orthodox Global Workspace Theory holds that only what is broadcast into the workspace is conscious. Theory X alters that assumption. Many people hold that conscious experience vastly outruns attention. Many people hold, that is, that you can experience the hum of traffic in the background when you're not attending to it, and the feeling of your feet in your shoes, and the leftover taste of coffee in your mouth, etc. -- all in a peripheral way, simultaneously, when your focus is elsewhere. Theory X, drawing on Recurrent Processing Theory, holds that such processes are conscious whenever there's enough cognitive activity of the right sort (recurrent processing, for example) in the modules, even without global broadcast.

The picture, then, is this: We have multiple sensory (and other) experiences all running simultaneously, each with enough cognitive processing to be conscious, but few of which are selected for global availability through attention.

Is there reason to think these modular processes are unified with one another? I see no reason to think so, if they're genuinely modular -- that is, if their processing stays local, exerting little influence elsewhere. The taste-of-beer processing stays in the tasting module. The sound-of-music processing stays in the auditory module. No link up. No straightforward causal, functional, or physiological basis for a unified experience of beer-with-music rather than, separately, an experience of beer and an experience of music.

When we introspect the beer and music simultaneously, we pull both into the Global Workspace, and there they unify. We might then mistakenly think they were unified all along, but that's an illusion of introspection. It's an example of the "refrigerator light error", the error of thinking that the light is always on because it's always on when you open the door to check.

On this model, disunity is the normal human condition. Our experiences are fragmented, except when we pull them together through attention. We just don't realize that fact because, so to speak, we only attend to what we attend to.

Two caveats:

First, this is probably not the right model of consciousness. But I don't think it's unreasonable to wonder if the correct model is similar enough to have the same implications. If so, we can't simply accept the unity of consciousness as a given.

Second, the recurrent peripheral, modular processes that don't make it into the workspace might not be determinately conscious. They might be only borderline conscious, in the indeterminate middle between consciousness and nonconsciousness, like a color can be indeterminately between green and not-green. This opens a third possibility, alongside unity and disunity: unity among the determinately conscious experiences with a hazy penumbra of indeterminate experiences that remain disunified. (There are further possibilities beyond these three; but save them for another day.)

tag:blogger.com,1999:blog-26951738.post-1902524606472828507
Extensions
Backup and Death for Humanlike AI
AIpersonal identity
Show full content
Most AI systems can be precisely copied. Suppose this is also true of future conscious AI persons, if any exist. Backup and fissioning should then be possible, transforming the significance of identity and death in ways our cultural and conceptual tools can't currently handle.

Suppose that two humanlike AI neighbors move in next door to you, Shriya and Alaleh.[1] Shriya and Alaleh are (let's stipulate) conscious AI persons with ordinary, humanlike emotional range and, as far as feasible, ordinary, humanlike cognition.[2] Each undergoes an expensive annual backup procedure. Their information is securely stored, so that if the processors responsible for their personalities, values, skills, habits, and memories are destroyed, a new robotic body can be purchased and the saved information reinstalled. Subjectively, the restored person would be indistinguishable from the person at the time of the backup.

As it happens, Shriya dies in a parachuting accident. (Safety precautions for robot parachuters have yet to be perfected.) But "dies" isn't exactly the right word, since a week later a new Shriya arrives, restored from a back up from five months ago. Shriya-2 says it feels as if she fell asleep in March, then awoke in August with no sense that time had passed.

Shriya-2 has no direct memories of the intervening months, though Alaleh fills her in on major events and selected details. She'll also need to retake her knitting course. She only died in the sense that Mario "dies" in Super Mario Bros: losing progress and returning to a save point -- so different from ordinary human and animal death that it really deserves a different word. Maybe this is why Shriya was so willing to parachute despite the risks.

Should you mourn Shriya's loss? Should Alaleh? There's something to mourn: Five months is not trivial. In one sense, a part of a life has been lost -- or maybe just forgotten? Is it more like amnesia?

Consider variations. Suppose Shriya hadn't been able to afford a backup for the past ten years and is restored to her twenty-five-year-old self instead of her thirty-five-year-old self. What if her last backup was at age five? That would be much more like death. The new Shriya would be nothing like the old, and would likely grow into a very different person. Is death, then, a matter of degree?

Shriya-2 receives the original Shriya's possessions. This "death" isn't enough to trigger inheritance by others. But what about contracts and promises made after the last backup? Suppose the original Shriya promised in July to deliver lectures in China, and Shriya-2 -- who has no memory of this and dreads the idea -- must decide whether to honor the commitment. If the backup is from five months before, perhaps she should. If it's from five years before, maybe not. And if it's a child, presumably not.

What about reward and punishment? Should Shriya-2 accept a Nobel prize for work done post-backup? Should Shriya-2 be imprisoned for crimes committed in July, which she couldn't even possibly remember having committed and which -- she might plausibly say -- were committed by a different person. In defense of this view, Shriya-2 might offer a thought experiment: If she had been installed in a duplicate body immediately after the March backup, thereafter living her own life, she'd have no criminal responsibility for what her other branch in did July. The only difference between that case and the actual case is a delay before installation.

Suppose Shriya-2 plunges into unrelenting depression. She ends her life, hoping that a new Shriya-3, reinstalled from a pre-depression save point, will find a new, happier way forward. Is that suicide?

If someone kills Shriya-2, is that murder? Does it matter whether the backup was ten days ago or ten years ago?

A fire sweeps through your neighborhood. The firefighters can rescue either you and your spouse, two ordinary humans, or Shriya and Alaleh, who have backups from seven months ago. Probably they should save you and your spouse? What if the backups were from ten years ago, or from childhood?

Should healthcare be more heavily subsidized for ordinary humans that for AI persons whose maintenance is equally costly? If irreplaceable humans are always prioritized, then human irrecoverability becomes a source of privilege, and AI persons will not enjoy fully equal rights in certain respects.

How obligated are we to store the backups properly? Is this a public service that should be subsidized for less wealthy AI persons? If Dr. Evil deletes Shriya’s backup, he has surely wronged Shriya by putting her at risk, even if the backup is never needed and the deletion goes unnoticed. But how much has he wronged her, and it what way exactly? Is it similar to assault? How much does it differ from ordinary reckless endangerment? Does it depend on whether we regard Shriya-2 as the same person as the original Shriya, or as a distinct but similar successor?

What if the backup is imperfect? How much divergence in personality, values, memories, habits, and skills is tolerable before the appropriate attitude toward Shriya-2 changes -- whatever the appropriate attitude is? Small imperfections are surely acceptable. People change in small, arbitrary ways from day to day. Huge differences would presumably make it appropriate to regard the new entity as merely resembling Shriya, rather than being a restored version of her. Once again, this appears to be a matter of degree, laid uncomfortably across crude categorical properties like "same person" and "different person".

We're in unfamiliar territory, where our usual understandings of death and personal continuity no longer straightforwardly apply. If such AI systems ever come to be, we will need to develop new words, concepts, and customs.

[Data and Lore from Star Trek; image source]

---------------------------------------

[1] Names randomly chosen from lists of former lower division students, excluding Jesus, Mohammed, and extremely unusual names.

[2] Unless humanlikeness is enforced by policy, this might not be what we should expect: See Chilson and Schwitzgebel 2026. For some puzzles about AI with different emotional ranges, see "How Much Should We Give to a Joymachine?" (Dec 24, 2025).

---------------------------------------

Related: Weird Minds Might Destabilize Human Ethics (Aug 13, 2015).

tag:blogger.com,1999:blog-26951738.post-6093701705481401910
Extensions
Age and Fame in Philosophy
sociology of philosophy
Show full content
A philosophical discussion arc is a curve displaying how frequently a term appears in philosophical journal abstracts, titles, and keywords (compared to a representative universe of common philosophy words). A couple of weeks ago, I posted discussion arcs from the 1940s-2020s for several topics and historical philosophers.

Today, I want to use them to explore my age and fame hypothesis: that philosophers tend to have peak influence at around ages 55-70. Jerry Fodor, for example, received a lot of discussion in the early 1990s, but recently much less:

Fodor was a peaky philosopher in the following sense: His peak discussion rate (1.05%) is much higher -- 12 times higher -- than his recent discussion rate (0.09%).

John Rawls, in contrast, is considerably less peaky (so far), peaking at only 2.7 times the current rate, despite having more time to decline:

Both Fodor and Rawls peak in their late 50s, fitting the pattern I've seen in previous analyses (here and here) that philosophers tend to reach their peak influence around age 55-70.

For today's post I decided to create discussion arcs for 25 philosophers who are highly cited in the Stanford Encyclopedia of Philosophy: twelve from the Not-So-Silent Generation in philosophy, born 1928-1945, and thirteen from the generation born 1900-1927. I chose the most SEP-cited philosophers from each generation, excluding ones whose last names generate noisy results (sorry, no David Lewis or Bernard Williams). I'll show you their discussion arcs, then do a composite analysis of discussion by age. The charts are a little crowded and blurry on some browsers; clicking on them might work to clarify and enlarge.

Here's the first group: Some observations:

  • Ayer peaks relatively early, but even so, there's significant delay between the publication of his most influential book in 1936, when we was only 26 years old, and peak discussion in the late 1950s, about twenty years later.
  • Quine and Popper peak later, in the 1970s, when Quine is in his mid-60s and Popper is in his late 70s.
  • All the authors are trailing off by the 2010s, though Quine, Popper, and Tarski less so than the others.

Here's a second group:

  • Notice that the vertical axis for this group doesn't rise quite as high as for the previous group, so the scaling isn't the same.
  • Dummett has peaks both in the late 1950s (for his early work on causation) and the early 1980s (discussing a wide range of work in metaphysics and philosophy of language).
  • Although Anscombe's peak is early, in the early 1960s when she was in her early 40s, unlike any of the other authors, her discussion rate has been steadily rising over the past few decades.
Group three:
  • The vertical axis for this group is still lower: Kripke at his peak wasn't as proportionally much discussed by this measure as were Quine or Rawls at their peaks. It's possible that this reflects a flaw in my method. Later in the database, abstracts are more available and longer, and although I attempted to compensate for this by comparing with search result for terms like "mind", "language", and "ethics", it's possible I overcompensated. However, it's also possible that this trend toward lower peaks with younger authors is real. As the field grew larger, there may have been less room for a few thinkers to dominate it as thoroughly.
  • Kripke, Nozick, and Searle peak relatively early, compared with most philosophers I have examined -- in the 1970s, when they're in their late 30s to early 40s.
  • Though Stalnaker also peaked in the late 1970s, when he was in his late 30s, like Anscombe his has risen in recent decades, and in 2021-2025 he is almost back to his earlier peak.
Finally:
  • These authors have had the least time to decline from their peaks. But their discussion rates of .001 to .004 in 2021-2025 are comparable to those from earlier generations (and substantially less than Kripke and Rawls), which suggests that they might have relatively less staying power.
  • You'll probably also have noticed it’s a much narrower age band than the others. There's a high density of Silent Generation philosophers cited in the Stanford Encyclopedia, for reasons I discuss here.

In the following graph, I've aggregated discussion rates by age across all of the included authors, with five-year smoothing: Broadly in line with my previous work, the average peak is in the early 50s to the late 60s -- though as you can see from the graphs above, there's considerable individual variation. Overall, the results might trend a little bit earlier than what I've seen in other analyses, but not by much.

As I've noted elsewhere, peak influence is often a couple of decades after the thinker's most influential work. For example (besides Ayer as described above), Kuhn's and Popper's peaks in the late 1970s were 15-20 years after Kuhn's Structure of Scientific Revolutions and about 20 years after the English translation of Popper's Logic of Scientific Discovery.

tag:blogger.com,1999:blog-26951738.post-9121807252649603856
Extensions
Philosophy Should Be Among the Most Diverse Disciplines, Not the Least
metaphilosophysociology of philosophy
Show full content
Philosophy should be among the most diverse of the academic disciplines. Instead, it is among the least diverse.

Philosophical reflection is an essential part of the human condition, of interest to people of all cultures, races, classes, social groups, and body types. Who doesn’t care whether we have immaterial souls that might continue to exist after we have died, about ethical issues such as war and human rights, about what’s worth pursuing in life, about when and how far we should trust scientific authority, about the best forms of government, about the origin and structure of the world? Nothing about these issues – and nothing about philosophy as a discipline devoted to the fundamental questions of human existence – should make it of more interest to one gender rather than another, one cultural group rather than another, or to the able-bodied more than to the disabled.

Yet study after study and testimonial after testimonial show that the culturally privileged are overrepresented in academic philosophy.

[Kandinsky 1913, composition vii; image source]


For example, the self-reported gender of regular members of the American Philosophical Association in 2025 was 70% male, 29% female, and 1% nonbinary/something else. Gender balance has been improving, though slowly: In 2015, the corresponding percentages were 75%, 25%, and 0%. The pipeline into philosophy suggests that change will continue: According to the National Science Foundation’s Survey of Earned Doctorates, 37% of philosophy PhD recipients in 2024 were female. Despite this qualifiedly encouraging trend, philosophy PhDs remain more male than any of the other humanities besides theology and Bible studies and more male than every social science besides economics and finance.

Black people and American Indians are especially underrepresented in U.S. philosophy, and that situation doesn’t appear to be changing at all. In 2025, 4% of regular APA members reported being Black or African American, compared with 14% of the U.S. population. The NSF data suggest no surge of new Black philosophers in the pipeline: the percentage among recent PhD recipients is also 4%. The NSF data show no American Indian or Alaskan Native philosophy PhD recipients in 2024 (though people reporting both American Indian and Hispanic or Latino identities would not appear in this category) and only one in past four years (among 1692 doctorates awarded), although Native Americans constitute nearly 2% of the U.S. population.

Another concern that has recently drawn attention is the linguistic insularity of mainstream Anglophone philosophy – that is, the neglect of work written in other languages. In 2018, three collaborators and I examined citation practices in leading Anglophone journals and found that 97% of citations referred to work originally written in English. Journals published in other languages were much less insular. We also found that 96% of the editorial board members of journals perceived as elite were housed in majority-Anglophone countries. More recently, Uwe Peters and collaborators reported that non-native English speakers face substantial difficulties publishing in English, as is now practically required for gaining an international readership.

The forthcoming book Structural Injustice in Philosophy, edited by Maeve McKeown, Seunghyun Song and Milana Kostić, further documents, and aims to explain, the exclusionary structures of philosophy, not only concerning race, gender, and language, but also class, disability, culture, mobility, sexuality, place of origin, and more.

#

Could a structurally just academic system nonetheless happen to produce such skewed results, always in favor of the already powerful? Let’s not join the long string of bigots who have held that some races, languages, genders, nationalities, classes, or physical types are more intrinsically suited for philosophy than others.

It’s sometimes suggested that academic philosophy is – or is not unreasonably perceived as – a useless luxury, rightly scorned by people lacking cultural privilege. Students from less privileged backgrounds might see studying philosophy, instead of a more financially or practically rewarding discipline, as wasting a precious opportunity to achieve financial security or practical success. But even if this perception were entirely accurate – and I would argue it’s oversimplified – it wouldn’t constitute a genuine alternative to structural injustice. A society in which less privileged students feel less free to study philosophy than business or nursing is already a society with substantial injustice in higher education. The perceived impracticality of the philosophy major is a symptom and mechanism of structural injustice rather than a neutral fact. In a fair society, students who love philosophy wouldn’t be disproportionately deterred by class background.

Even in a completely just society, students and professors will not always populate each discipline in exact proportion to their background rate in the general population. Women might be drawn more to developmental psychology and men to architecture, White people to European history and Black people to African history, or whatever, for innocent reasons. It might be suggested that similarly, philosophical issues just happen to be of more interest to people from culturally powerful groups. I’ve already articulated the flaw in this argument: It strains credulity to suppose that men more than women, or White people more than Black people, or people from the United States more than people from Brazil or China, or sighted people more than blind people, would or should, in a just society, care more about truth, justice, ethics, knowledge, and the fundamental nature of reality. If anything, the opposite might be expected: Those who suffer under existing institutions should be especially motivated to think hard about the reform of those institutions and the cultural presuppositions that undergird them.

Might people from less privileged backgrounds reasonably be expected, even in a just society, to be less interested than others in academia in general, and could this explain the pattern in philosophy? Again, I see no good reason to think so and reason to think the opposite: Professorships offer good salaries (for those who escape the cycle of adjuncting) and offer unusual freedom to explore one’s interests and advocate change. People from less privileged backgrounds might be especially drawn to such opportunities – at least if the career seems genuinely open to them – more so than people from elite backgrounds with many other attractive options. In any case, if the problem were academia as such, and not philosophy in particular, we should see similar disparities across all fields. And while many historically disadvantaged groups continue to be underrepresented in many disciplines, philosophy is more imbalanced than most across a wide range of measures.

#

In a just society, historically underrepresented groups and minority perspectives would be overrepresented rather than underrepresented in philosophy. Academic philosophy should celebrate diversity of opinion, encourage challenges to orthodoxy, and reward fresh perspectives from cultures and life experiences outside the mainstream. We should be eager, not reluctant, to hear from a wide range of voices. We should especially welcome, not create a chilly environment for, people with unusual or culturally atypical or historically neglected ideas, practices, and worldviews. The productive engine of philosophy depends on novelty and difference.

Philosophy is a dialectical discipline that thrives in the clash, reconciliation, and creative synthesis of diverse views; and our views are profoundly shaped by our cultural backgrounds and life experiences. Uniformity dulls our collective philosophical thinking. A fair and flourishing discipline would treasure rather than repel those who have historically been excluded. Consequently, even if every social group were proportionally represented in philosophy, we would still have reason to suspect systemic injustice. The injustices of our discipline will not be overcome until we are collectively eager to hear proportionally more from previously excluded groups than from the privileged and powerful.

tag:blogger.com,1999:blog-26951738.post-7542230007706691724
Extensions
Discussion Arcs for Topics and Philosophers
discussion arcssociology of philosophy
Show full content
Let's define a discussion arc in philosophy as a curve displaying how frequently a term appears in philosophical journal abstracts, titles, and keywords. Discussion arcs can reveal trends over time, as they come and go.

For example, "ordinary language" was a popular topic of discussion in the 1960s: [click to enlarge and clarify]

My method: I searched for the term in the abstract, title, or keywords (subject) of articles in Philosopher's Index, in five-year chunks from 1940-1945 to 2021-2025.[1] I then divided the number of hits by a representative universe of hits for abstract or title containing "language", "epistemology", "mind", "metaphysics", "ethics", "moral", "political", "aesthetics", or "logic".[2]

By looking for distinctive terms or phrases connected to particular arguments, we can also track when an argument receives wide uptake. For example, Hilary Putnam's famous "Twin Earth" thought experiment, though originally presented in 1975, didn't receive much immediate uptake, receiving peak discussion in the 1990s:

We can also look for topical trends. If you think artificial intelligence is recently hot, you're right!

The figure also shows a smaller peak in the late 1980s, before the "AI winter" that has recently thawed.

Another thing you might see in the artificial intelligence arc, if you squint, is noise early in the data. This makes sense, since the early data have far fewer hits in the representative universe of articles containing one of the nine keywords: 457 articles in 1940-1945, compared to 33,552 in 2016-2020.

We can also examine the discussion arcs of particular philosophers. Note that "discussion" in the sense I've defined is very different from citation. A philosopher who is cited in passing might have very little influence on the shape of an article. In contrast, if a philosopher's name is explicitly mentioned in the title, abstract, or keywords, that philosopher's work is among the chief topics of the article. Discussion rates and citation rates thus capture different phenomena and will sometimes diverge.

Here are the arcs for the seven "most important Western philosophers of all time", according to a poll by Brian Leiter in 2017.

Notice how much discussion there was of Aristotle in the early 1940s! "Aristotle" alone gets more than a quarter as many hits as the broad disjunctive search for "language" or "epistemology" or "mind" or "metaphysics" or "ethics" or "moral" or "political" or "aesthetics" or "logic". Plato and Kant each also get almost as many as Aristotle.

Since then, there's been a steady decline in the proportion of articles mentioning these prominent historical figures. My sense is that this is due to two factors: first, the decline in the relative prominence of history of philosophy compared to other areas of philosophy; second, a shift within history of philosophy toward considering a broader range of traditions and figures, including non-Western traditions and women philosophers.

It's perhaps also interesting that Kant has declined less than the ancient Greeks, overtaking them as the most discussed philosopher.

I thought I'd finish today with discussion arcs of some more recent historical figures, who peak at different times:

Notice that Russell peaks in the 1940s to 1950s, during which period he is discussed almost as much as Aristotle, Plato, and Kant, who themselves are (as I noted above) discussed in a striking proportion of articles. There is a little false positive noise for Russell (some other people with "Russell" as a first or last name), but my estimate is that about 90% of hits are accurate.

Wittgenstein peaks in the late 1960s, Frege in the early 1980s, and Nietzsche in the early 2000s. Heidegger's influence is moderately steady from the late 1960s to the early 2000s, declining modestly in the past couple of decades.

Perhaps worth mentioning: These data cover a wide range of journals, regardless of prestige. In an analysis a few years ago, I found that the journals conventionally regarded as prestigious mention prominent "analytic" philosophers such as Frege, Wittgenstein, Quine, Chisholm, and Putnam much more frequently than prominent "Continental" philosophers such as Nietzsche, Heidegger, Sartre, Foucault, and Derrida: The analytic/Continental divide is alive and well. There might be other interesting trends to discover if we subdivide the universe of articles.

In a week or two, I'll do a follow up post looking at the discussion arcs of prominent philosophers born 1900-1945, partly to test my "age and fame" hypothesis: that philosophers tend to have their peak influence around ages 55-70.

--------------------------------------------

[1] The data begin in 1940, so I used a six-year chunk for the first period. As of Feb 27, data for 2025 are only about two-thirds complete and there are no data for 2026.

[2] This is not all articles, especially in the early period, where abstracts are less available. The idea is to roughly compensate for having fewer abstracts, and thus fewer opportunities for hits in the early data, by comparing the rate of appearance of the target terms with the rate of appearance of several, but not too many, widely used terms across a variety of philosophical topics.

tag:blogger.com,1999:blog-26951738.post-8186957024734318195
Extensions
Disunity and Indeterminacy in Artificial Consciousness (and Maybe in Human Consciousness Too)
AIconsciousness
Show full content
Our understanding of the nature of consciousness derives mainly from our understanding of the nature of consciousness in our favorite animal (us, of course). But the features of consciousness in our favorite animal might be specific to that animal rather than universal.

Let's consider two such features and whether we should expect them in conscious AI systems, if conscious AI systems are ever possible.

Unity: Our conscious experiences at any given moment are bound together into a single unified experience, rather than transpiring in separate streams. If I'm sitting on a wet park bench, I might (a.) visually experience the leafy green trees around me, (b.) tactilely experience the cold dampness soaking into my jeans, and (c.) consciously recall the smaller trees of yesteryear. Normally -- perhaps necessarily -- three such experiences would not run in disconnected streams. They would join into a composite experience of (a)-with-(b)-with-(c). I experience not just trees, cold dampness, and a memory of yesteryear, but all three together as a unified bundle.

Determinacy: At any given moment, I am either determinately conscious or determinately nonconscious (as in anesthesia or dreamless sleep). Likewise, I either determinately do, or determinately do not, have any particular experience. Gray-area cases are at least unusual and maybe impossible. Even the simplest, barest cases are still determinate. Consider visual experience: We might imagine the visual field narrowing and losing content until only a gray dot remains -- and then the dot winks out. That dot, however minimal, is still determinately experienced. When it winks out, consciousness determinately disappears. There is no half-winked state between the minimal gray dot and complete absence of visual experience.

My thought is that we should not expect unity and determinacy to be general features of conscious AI systems (if conscious AI is possible). To see why, let's start by assuming the Global Workspace Theory of consciousness. I focus on Global Workspace Theory because it's probably the leading scientific theory of consciousness and because its standard formulation (Dehaene's version) invites the assumption of unity and determinacy.

Global Workspace Theory divides the mind into local information processing modules linked by a shared global workspace. Information becomes conscious when it is broadcast into the workspace. Suppose your auditory system registers the faint honk of a distant car horn. You're absorbed in reading philosophy and accustomed to ignoring traffic noise, so this representation isn't selected for further processing. It's not a target of attention, not broadcast into the workspace, and not consciously experienced. (If you think you constantly consciously experience background sounds, you can't hold a standard Global Workspace view.) Once you attend to the noise, for whatever reason, that information "ignites" into the global workspace, becoming available to a wide variety of "downstream" processes: You can think about it, plan around it, verbally report it, store it in long-term memory, and flexibly combine it with other information in the workspace. On Global Workspace Theory, being available in this way just is what it is for the information to be consciously experienced.

This model suggests unity and determinacy. Since there is just one global workspace, and since that workspace enables flexible integration of everything it contains, it makes sense that its various elements will combine into a unified experience. And on Dehaene's version, ignition into the workspace is a sharp-boundaried event: Information either completely ignites, becoming available for all downstream processes, or it does not. There is no (or only rarely) partial ignition. This can explain determinacy.

But future AI systems might not share this structure. They might have multiple or partially overlapping workspaces. Different specialized subsystems might have access to different regions of a partly-shared workspace. Some animals, such as snails and octopuses, distribute processing among multiple ganglia or neural centers that are less tightly coupled than the hemispheres of the human brain. A robot might broadcast information relevant to locomotion to one area and information relevant to speech to another with limited connectivity.

If the subsystems are entirely disconnected, the result might be entirely discrete centers of subjective experience within a single organism or machine. But if they are partly connected, experience might be only partly unified. In the park bench example, the experience of the trees might be unified with the experience of dampness, and the experience of dampness with memories of yesteryear, but the experience of the trees might not be unified with the memories. (Unification would not then be a transitive relation.) Alternatively, some weaker relation of partial unification might hold among the visual, tactile, and memorial experiences. If this seems inconceivable or impossible, see Sophie Nelson's and my article on indeterminate or fractional subjects.

More abstractly: There's no compelling architectural reason why an AI system would have to make information available either to all downstream processes or to none. A workspace defined in terms of downstream availability could be a patchwork of partial availabilities rather than a fully global all-or-nothing broadcast.

For the same reason, ignition into the workspace needn't be all-or-nothing. Between full ignition with determinate consciousness and no ignition with determinate nonconsciousness, there might be in-between, gray-area half-ignitions that are neither determinately conscious nor determinately nonconscious. Nearly every property with a complex physical or functional basis allows indeterminate, borderline cases: baldness, extraversion, greenness, happiness, whether you're wearing a shoe, whether a country is a democracy. The human global workspace might minimize indeterminacy -- like it's rarely indeterminate in basketball whether the ball has gone through the hoop. But change the architecture and indeterminacy might become common: a half-hearted ignition, or just enough information-sharing to make it indeterminate whether a workspace even exists. (If indeterminacy about consciousness strikes you as inconceivable or impossible, see my 2023 article on borderline consciousness.)

Global Workspace Theory might of course be wrong. But most other theories of consciousness make my argument at least as easy. Dennett's fame-in-the-brain version of broadcast theory explicitly permits disunity and indeterminacy. Higher Order Theories admit the same fragmentation and, probably, gradualism. So do biological theories and theories that focus on embodiment. (Integrated Information Theory is an exception: Its axioms require bright-lined unity and determinacy. But as I've argued, those bright-line axioms lead to unpalatable consequences.)

Recognizing these possibilities for AI systems invites the further thought: Maybe we humans aren't quite as unified as we normally suppose. Maybe indeterminate and disunified consciousness is common. Maybe processes outside of attention hover indeterminately between being conscious and nonconscious. Maybe some processes are only partly unified. If it seems otherwise in introspection and memory, maybe that's because introspection and memory tend to impose unity and determinacy where none was before.

[a Paul Klee painting, untitled 1914: source]

tag:blogger.com,1999:blog-26951738.post-2388212527284943173
Extensions
The Intrinsic Value of Diversity
aestheticsaxiology
Show full content
Moral diversity, Olivia Bailey and Thi Nguyen say (in a draft paper shared with Myisha Cherry's Emotion and Society Lab), is valuable. It's good that people have different ethical personalities, opinions, and concerns. (Within reason: Nazis not welcome.)

Why? Their reasons are instrumental. Society benefits when people care intensely about different things. This allows us collectively to achieve a wide range of goals -- curing cancer, helping the homeless, protesting unjust government. Society also benefits if some people explore the ethical possibility space, developing unusual moral visions, most of which will be mistaken but a few of which might eventually be recognized as genuine moral advances (think of the first slavery abolitionists). And individuals benefit from the liberty to adopt moral priorities that fit their skills and temperaments: Some people thrive in battle, others in caregiving, others in solitary work.

But is moral diversity also intrinsically valuable -- that is, valuable for its own sake, independent of these good consequences? I think so. I think so because diversity in general is intrinsically valuable, and there's no good reason to treat moral diversity as an exception.

How does one argumentatively establish the intrinsic value of diversity? The only way I know is to reveal, through thought experiment, that you already implicitly accept it -- and then to ward off objections.

Bailey and Nguyen briefly cite Alexander Nehamas on diversity of aesthetic opinion. Nehamas writes:

I think a world where everyone liked, or loved, the same things would be a desperate, desolate world -- as devoid of pleasure and interest as the most frightful dystopia of those who believe (quite wrongly) that the popular media are inevitably producing a depressingly, disconsolately uniform world culture. And although I say this with serious discomfort, a world in which everyone liked Shakespeare, or Titian, or Bach for the same reasons -- if such a world were possible -- appears to me no better than a world where everyone tuned in to Baywatch or listened to the worst pop music at the same time (Nehamas 2002, p. 58-59).

Why is aesthetic diversity valuable, according to Nehamas? Because style and taste require originality and are bound up with what is distinctive about your life, interests, and sensibility. Without distinctiveness, style and taste collapse -- an aesthetic disaster.

Should we say, then, that diversity, including moral diversity, is valuable aesthetically? That its value lies primarily in its beauty, in its capacity to inspire awe, or some other aesthetic feature? Indeed, diversity is beautiful and awesome (imagine the world without it!) but I don't think this exhausts its intrinsic value. Aesthetic value requires a spectator, at least a notional one, whose appreciation is the point. The intrinsic value of diversity is not, or not primarily, mediated through the hypothetical reaction of an aesthetic spectator.

My favorite approach to thinking about intrinsic value is the Distant Planet Thought Experiment. Imagine a planet on the far side of the galaxy, blocked from view by the galactic core, a planet we'll never see or interact with. What would we hope for on this planet, for its own sake, independent of any potential value for us?

Would you hope that it's a sterile rock, completely devoid of life? I think not. If you do think a lifeless rock would be best, I have no argument against you. For me this is a starting place, a bedrock judgment, which I expect most readers will share.

Suppose, then, that you agree a planet with life would be intrinsically better than one without. Would you hope that its life consists entirely of microbes? Or would you hope that it teems with diverse life: reefs and rainforests, beetles and bats, squid and bees and ferns and foxes -- or rather, not to duplicate Earth too closely, their alien analogues, translated into a different key? I think you'll hope that the planet teems with diverse life.

Would you hope that no life on this planet has humanlike behavioral sophistication -- language, long-term planning, complex social coordination? Would you hope that nothing there could contemplate the meaning of life, the origin of the stars, or its own ancient history? Would you hope that nothing there could create art, or engage in athletic competition, or invent complex games and tricks and jokes? I invite you to join me in thinking otherwise. The planet would be better if it included some beings with that richness of thought and activity.

Would you hope for uniformity of intellectual, aesthetic, and ethical opinion -- that everyone shares the same values and ideas? Or would you hope for diversity? I think you'll join me in thinking that the world would be better, better for its own sake, if it were diverse rather than uniform. Different entities would have different skills, preferences, passions, and ideas. They'll fight and disagree (not genocidally, I hope), sometimes value their differences, sometimes dismiss others as completely wrongheaded, sometimes cluster into shared projects, sometimes collaborate across deep disagreement, sometimes be drawn to opposites, sometimes feel kinship with the like-minded, play within and across divides, pursue an enormous variety of projects, explore a vast space of possible forms of life.

That is what I hope for on this distant planet -- not for instrumental reasons (not, for example, because it will maximize happiness), and not merely because it would strike a hypothetical spectator as beautiful and awesome (though it should). Rather, just because it would be valuable for its own sake. An empty void has little or no value; a rich plurality of forms of existence has immense value, no further justification required.

I have not argued for this. I have only stated it vividly, hoping that you already accept it.

Is ethical opinion an exception? Should we prefer unity and conformity in ethics, even while welcoming diversity elsewhere? I think not, for two reasons.

First, ethics is open-textured, indeterminate, and full of tragic dilemmas. Often there is no one decisively best answer on which everyone should converge. Diversity within at least the bounds of reasonable disagreement should be permitted.

Second, ethical values are inseparable from our other values and ways of life. A philosophy professor, a civil rights lawyer, a professional athlete, and a farmer will value different things. There is, I think, no point in attempting to cleanly separate their differing values into distinct types, some of which are permitted to vary and others of which may not. The ethical, prudential, epistemic, and aesthetic blur together. These distinctions are not as clean as philosophers often assume. Normativity is a mush.

Oh, some of you disagree? Good!

[the cover of my 2024 book, The Weirdness of the World, hardback version]

tag:blogger.com,1999:blog-26951738.post-3903730147481796076
Extensions
Artificial Intelligence as Strange Intelligence: Against Linear Models of Intelligence (New Paper in Draft)
AI
Show full content
by Kendra Chilson and Eric Schwitzgebel

Our main idea, condensed to 1000 words:

On a linear model of intelligence, entities can be roughly linearly ordered in overall intelligence: frogs are smarter than nematodes, cats smarter than frogs, apes smarter than cats, and humans smarter than apes. This same linear model is often assumed when discussing AI systems. "Narrow AI" systems (like chess machines and autonomous vehicles) are assumed to be subhuman in intelligence, at some point -- maybe soon -- AI systems will have approximately human-level intelligence, and in the future we might expect superintelligent AI that exceeds our intellectual capacity in virtually all domains of interest.

Building on the work of Susan Schneider, we challenge this linear model of intelligence. Central to our project is the concept of general intelligence as the ability to use information to achieve a wide range of goals in a wide variety of environments.

Of course even the simplest entity capable of using information to achieve goals can succeed in some environments, and no finite entity could succeed in all possible goals in all possible environments. "General intelligence" is therefore a matter of degree. Moreover, general intelligence is a massively multidimensional matter of degree: There are many many possible goals and many many possible environments and no non-arbitrary way to taxonomize and weight all these goals and environments into a single linear scale or definitive threshold.

Every entity is in important respects narrow: Humans also can achieve their goals in only a very limited range of environments. Interstellar space, the deep sea, the Earth's crust, the middle of the sky, the center of a star -- transposition to any of these places will quickly defeat almost all our plans. We depend for our successful functioning on a very specific context. So of course do all animals and all AI systems.

Similarly, although humans are good at a certain range of tasks, we cannot detect electrical fields in the water, dodge softballs while hovering in place, communicate with dolphins by echolocation, or calculate a hundred digits of pi in our heads. If we put a server with a language model in the desert without a power source or if we place an autonomous vehicle in a chess tournament and then interpret their incompetence as a lack of general intelligence, we risk being as unfair to them as a dolphin would be to blame us for our poor skills in their environment. Yes, there's a perfectly reasonable sense in which chess machines and autonomous vehicles have much more limited capacities than do humans. They are narrow in their abilities compared to us by almost any plausible metric of narrowness. But it is anthropocentric to insist that general intelligence requires generally successful performance on the tasks and in the environments that we humans tend to favor, given that those tasks and environments are such a small subset of the possible tasks and environments an entity could face. And any attempt to escape anthropocentrism by creating an unbiased and properly weighted taxonomy of task types and environments is either hopeless or liable to generate a variety of very different but equally plausible arbitrary composites.

AI systems, like nonhuman animals and neuroatypical people, can combine skills and deficits in patterns that are unfamiliar to those who have attended mostly to typical human cases. AI systems are highly unlikely to replicate every human capacity, due to limits in data and optimization, as well as a fundamentally different underlying architecture. They struggle to do many things that ordinary humans do effortlessly, such as reliably interpreting everyday visual scenes and performing feats of manual dexterity. But the reverse is also true: Humans cannot perform some feats that machines perform in a fraction of a second. If we think of intelligence as irreducibly multidimensional instead of linear -- as always relativized to the immense number of possible goals and environments -- we can avoid the temptation to try to reach a scalar judgment about which type of entity is actually smarter and by how much.

We might think of typical human intelligence as "familiar intelligence" -- familiar to us, that is -- and artificial intelligence as "strange intelligence". This terminology wears its anthropocentrism on its sleeve, rather than masking it under false objectivity. Something possesses familiar intelligence to the degree it thinks like us. It is a similarity relation. How familiar an intelligence is depends on several factors. Some are architectural: What forms does the basic cognitive processing take? What shortcuts and heuristics does it rely on? How serial or parallel is it? How fast? With what sorts of redundancy, modularity, and self-monitoring for errors? Others are learned and cultural: learned habits, particular cultural practices, acquired skills, chosen effort based on perceived costs and benefits. An intelligence is outwardly familiar if it acts like us in intelligence-based tasks. And it is inwardly familiar if it does so by the same underlying cognitive mechanisms.

Familiarity is also a matter of degree: The intelligence of dogs is more familiar to us (in most respects) than that of octopuses. Although we share some common features with octopuses, they evolved in a very different environment and have very dissimilar cognitive architecture as a result. It's hard for us even to understand their goals, because their existence is so different. Still, as distant as our minds are from those of octopuses, we share with octopuses the broadly familiar lifeways of embodied animals who need to navigate the natural world, find food, and mate.

AI constitutes an even stranger form of intelligence. With architectures, environments, and goals so fundamentally unlike ours, AI is the strangest intelligence we have yet to encounter. AI is not a biological organism; it was not shaped by the evolutionary pressures shared by every living being on Earth, and it does not have the same underlying needs. It is based on an inorganic substrate totally unlike all biological neurophysiology. Its goals are imposed by its makers rather than being autopoietic. Such intelligence should be expected to behave in ways radically different from familiar minds. This raises an epistemic challenge: Understanding and measuring strange intelligence may be extremely difficult for us. Plausibly, the stranger an intelligence is from our perspective, the easier it is for us to fail to appreciate what it’s up to. Strange intelligences rely on methods alien to our cognition.

If intelligence were linear and one-dimensional, then a single example of an egregious mistake by an AI -- a mistake a human would never make, like confusing a strawberry for a toy poodle -- would be enough to show that the systems are nowhere near our level of intelligence. However, since intelligence is massively multidimensional, all these cases show on their own is that these systems have certain lacunae or blindspots. Of course, we humans also have lacunae and blind spots – just consider optical illusions. Our susceptibility to optical illusions is not used as evidence of our lack of general intelligence, however ridiculous our mistakes might seem to any entity not subject to those same illusions.

Full draft here.

tag:blogger.com,1999:blog-26951738.post-7126519913303919387
Extensions
Does Global Workspace Theory Solve the Question of AI Consciousness?
AIconsciousness
Show full content
Hint: no.

Below are three sections from Chapter Eight of my manuscript in draft, AI and Consciousness, fresh new version available today here. Comments welcome!

[image adapted from Dehaene et al. 2011]


1. Global Workspace Theories and Access.

The core idea of Global Workspace Theory is simple. Sophisticated cognitive systems like the human mind employ specialized processes that operate to a substantial extent in isolation. We can call these modules, without committing to any strict interpretation of that term.[1] For example, when you hear speech in a familiar language, some cognitive process converts the incoming auditory stimulus into recognizable speech. When you type on a keyboard, motor functions convert your intention to type a word like “consciousness” into nerve signals that guide your fingers. When you try to recall ancient Chinese philosophers, some cognitive process pulls that information from memory without (amazingly) clogging your consciousness with irrelevant information about German philosophers, British prime ministers, rock bands, or dog breeds.

Of course, not all processes are isolated. Some information is widely shared, influencing or available to influence many other processes. Once I recall the name “Zhuangzi”, the thought “Zhuangzi was an ancient Chinese philosopher” cascades downstream. I might say it aloud, type it out, use it as a premise in an inference, form a visual image of Zhuangzi, contemplate his main ideas, attempt to sear it into memory for an exam, or use it as a clue to decipher a handwritten note. To say that some information is in “the global workspace” just is to say that it is available to influence a wide range of cognitive processes. According to Global Workspace Theory, a representation, thought, or cognitive process is conscious if and only if it is in the global workspace – if it is “widely broadcast to other processors in the brain”, allowing integration both in the moment and over time.[2]

Recall the ten possibly essential features of consciousness from Chapter Three: luminosity, subjectivity, unity, access, intentionality, flexible integration, determinacy, wonderfulness, specious presence, and privacy. [Blog readers: You won't have read Chapter Three, but try to ride with it anyway.] Global Workspace Theory treats access as the central essential feature.

Global Workspace theory can potentially explain other possibly essential features. Luminosity follows if processes or representations in the workspace are available for introspective processes of self-report. Unity might follow if there’s only one workspace, so that everything in it is present together. Determinacy might follow if there’s a bright line between being in the workspace and not being in it. Flexible integration might follow if the workspace functions to flexibly combine representations or processes from across the mind. Privacy follows if only you can have direct access to the contents of your workspace. Specious presence might follow if representations or processes generally occupy the workspace for some hundreds of milliseconds.

In ordinary adult humans, typical examples of conscious experience – your visual experience of this text, your emotional experience of fear in a dangerous situation, your silent inner speech, your conscious visual imagery, your felt pains – appear to have the broad cognitive influences Global Workspace Theory describes. It’s not as though we commonly experience pain but find that we can’t report it or act on its basis, or that we experience a visual image of a giraffe but can’t engage in further thinking about the content of that image. Such general facts, plus the theory’s potential to explain features such as luminosity, unity, determinacy, flexible integration, privacy, and specious presence, lend Global Workspace Theories substantial initial attractiveness.

I have treated Global Workspace Theory as if it were a single theory, but it encompasses a family of theories that differ in detail, including “broadcast” and “fame” theories – any theory that treats the broad accessibility of a representation, thought, or process as the central essential feature making it conscious.[3]

Consider two contrasting views: Dehaene’s Global Neuronal Workspace Theory and Daniel Dennett’s “fame in the brain” view. Dehaene holds that entry into the workspace is all-or-nothing. Once a process “ignites” into the workspace, it does so completely. Every representation or process either stops short of entering consciousness or is broadcast to all available downstream processes. Dennett’s fame view, in contrast, admits degrees. Representations or processes might be more or less famous, available to influence some downstream cognitive processes without being available to influence others. There is no one workspace, but a pandemonium of competing processes.[4] If Dennett is correct, luminosity, determinacy, unity, and flexible integration all potentially come under threat in a way they do not as obviously come under threat on Dehaene’s view.[5]

Dennettian concerns notwithstanding, all-or-nothing ignition into a single, unified workspace is currently the dominant version of Global Workspace Theory. The issue remains unsettled and has obvious implications for the types of architectures that might plausibly host AI consciousness.

2. Consciousness Outside the Workspace; Nonconsciousness Within It?

Global Workspace Theory is not the correct theory of consciousness unless all and only thoughts, representations, or processes in the Global Workspace are conscious. Otherwise, something else, or something additional, is necessary for consciousness.

It is not clear that even in ordinary adult humans a process must be in the Global Workspace to be conscious. Consider the case of peripheral experience. Some theorists maintain that people have rich sensory experiences outside of focal attention: a constant background experience of your feet in your shoes and objects in the visual periphery.[6] Others – including Global Workspace theorists – dispute this. Introspective reports vary, and resolving such issues is methodologically tricky.

One methodological problem: People who report constant peripheral experiences might mistakenly assume that such experiences are always present because they are always present whenever they think to check, and the very act of checking might generate those experiences. This is sometimes called the “refrigerator light illusion”, akin to the error of thinking the refrigerator light is always on because it’s always on when you open the door to check.[7] On this view, you’re only tempted to think you have constant tactile experience of your feet in your shoes because you have that experience on those rare occasions when you’re thinking about whether you have it. Even if you now seem to have a broad range of experiences in different sensory modalities simultaneously, this could result from an unusual act of dispersed attention, or from “gist” perception or “ensemble” perception, in which you are conscious of the general gist or general features of a scene, knowing that there are details, without actually experiencing those unattended details.[8]

The opposite mistake is also possible. Those who deny a constant stream of peripheral experiences might simply be failing to notice or remember them. The fact that you don’t remember now the sensation of your feet in your shoes two minutes ago hardly establishes that you lacked the sensation at the time. Although many people find it introspectively compelling that their experience is rich with detail or that it is not, the issue is methodologically complex because introspection and memory are not independent of the phenomena to be observed.[9]

If we do have rich sensory experience outside of attention, it is unlikely that all of that experience is present in or broadcast to a Global Workspace. Unattended peripheral information is rarely remembered or consciously acted upon, tending to exert limited downstream influence – the paradigm of information that is not widely broadcast. Moreover, the Global Workspace is generally characterized as limited capacity, containing only a few thoughts, representations, objects, or processes at a time – those that survive some competition or attentional selection – not a welter of richly detailed experiences in many modalities at once.[10]

A less common but equally important objection runs in the opposite direction: Perhaps not everything in the Global Workspace is conscious. Some thoughts, representations, or processes might be widely broadcast, shaping diverse processes, without ever reaching explicit awareness.[11] Implicit racist assumptions, for example, might influence your mood, actions, facial expressions, and verbal expressions. The goal of impressing your colleagues during a talk might have pervasive downstream effects without occupying your conscious experience moment to moment.

The Global Workspace theorist who wants to allow that such processes are not conscious might suggest that, at least for adult humans, processes in the workspace are generally also available for introspection. But there’s substantial empirical risk in this move. If the correlation between introspective access and availability for other types of downstream cognition isn’t excellent, the Global Workspace theorist faces a dilemma. Either allow many conscious but nonintrospectable processes, violating widespread assumptions about luminosity, or redefine the workspace in terms of introspectability, which amounts to shifting to a Higher Order view.

3. Generalizing Beyond Vertebrates.

The empirical questions are difficult even in ordinary adult humans. But our topic isn’t ordinary adult humans – it’s AI systems. For Global Workspace Theory to deliver the right answers about AI consciousness, it must be a universal theory applicable everywhere, not just a theory of how consciousness works in adult humans, vertebrates, or even all animals.

If there were a sound conceptual argument for Global Workspace Theory, then we could know the theory to be universally true of all conscious entities. Empirical evidence would be unnecessary. It would be as inevitably true as that rectangles have four sides. But as I argued in Chapter Four, conceptual arguments for the essentiality of any of the ten possibly essential features are unlikely to succeed – and a conceptual argument for Global Workspace Theory would be tantamount to a conceptual argument for the essentiality of access, one of those ten features. Not only do the general observations of Chapter Four suggest against a conceptual guarantee, so also does the apparent conceivability, as described in Section 2 above, of consciousness outside the workspace or nonconsciousness within it – even if such claims are empirically false.

If Global Workspace Theory is the correct universal theory of consciousness applying to all possible entities, an empirical argument must establish that fact. But it’s hard to see how such an empirical argument could proceed. We face another version of the Problem of the Narrow Evidence Base. Even if we establish that in ordinary humans, or even in all vertebrates, a thought, representation, or process is conscious if and only if it occupies a Global Workspace, what besides a conceptual argument would justify treating this as a universal truth that holds among all possible conscious systems?

Consider some alternative architectures. The cognitive processes and neural systems of octopuses, for example, are distributed across their bodies, often operating substantially independently rather than reliably converging into a shared center.[12] AI systems certainly can be, indeed often are, similarly decentralized. Imagine coupling such disunity with the capacity for self-report – an animal or AI system with processes that are reportable but poorly integrated with other processes. If we assume Global Workspace Theory at the outset, we can conclude that only sufficiently integrated processes are conscious. But if we don’t assume Global Workspace Theory at the outset, it’s difficult to imagine what near-future evidence could establish that fact beyond a reasonable standard of doubt to a researcher who is initially drawn to a different theory.

If the simplest version of Global Workspace Theory is correct, we can easily create a conscious machine. This is what Dehaene and collaborators envision in the 2017 paper I discussed in Chapter One. Simply create a machine – such as an autonomous vehicle – with several input modules, several output modules, a memory store, and a central hub for access and integration across the modules. Consciousness follows. If this seems doubtful to you, then you cannot straightforwardly accept the simplest version of Global Workspace Theory.[13]

We can apply Global Workspace Theory to settle the question of AI consciousness only if we know the theory to be true either on conceptual grounds or because it is empirically well established as the correct universal theory of consciousness applicable to all types of entity. Despite the substantial appeal of Global Workspace Theory, we cannot know it to be true by either route.

-------------------------------------

[1] Full Fodorian (1983) modularity is not required.

[2] Mashour et al. 2020, p. 776-777.

[3] E.g., Baars 1988; Dennett 1991, 2005; Tye 2000; Prinz 2012; Dehaene 2014; Mashour et al. 2020.

[4] Whether Dennett’s view is more plausible than Dehaene’s turns on whether, or how commonly, representations or processes are partly famous. Some visual illusions, for example, seem to affect verbal report but not grip aperture: We say that X looks smaller than Y, but when we reach toward X and Y we open our fingers to the same extent, accurately reflecting that X and Y are the same size. The fingers sometimes know what the mouth does not. (Aglioti et al. 1995; Smeets et al. 2020). We adjust our posture while walking and standing in response to many sources of information that are not fully reportable, suggesting wide integration but not full accessibility (Peterka 2018; Shanbhag 2023). Swift, skillful activity in sports, in handling tools, and in understanding jokes also appears to require integrating diverse sources of information, which might not be fully integrated or reportable (Christensen et al. 2019; Vauclin et al. 2023; Horgan and Potrč 2010). In response, the all-or-nothing “ignition” view can explain away such cases of seeming intermediacy or disunity as atypical (it needn’t commit to 100% exceptionless ignition with no gray-area cases), by allowing some nonconscious communication among modules (which needn’t be entirely informationally isolated), and/or by allowing for erroneous or incomplete introspective report (maybe some conscious experiences are too brief, complex, or subtle for people to confidently report experiencing them).

[5] Despite developing a theory of consciousness, Dennett (2016) endorsed “illusionism”, which rejects the reality of phenomenal consciousness (see especially Frankish 2016). I interpret the dispute between illusionists and nonillusionists as a verbal dispute about whether the specific philosophical concept of “phenomenal consciousness” requires immateriality, irreducibility, perfect introspectibility, or some other dubious property, or whether the term can be “innocently” used without invoking such dubious properties. See Schwitzgebel 2016, 2025.

[6] Reviewed in Schwitzgebel 2011, ch. 6; and though limited only to stimuli near the center of the visual field, see the large literature on “overflow” in response to Block 2007.

[7] Thomas 1999.

[8] Oliva and Terralba 2006; Whitney and Leib 2018.

[9] Schwitzgebel 2007 explores the methodological challenges in detail.

[10] E.g., Dehaene 2014; Mashour et al. 2020.

[11] E.g., Searle 1983, ch. 5; Bargh and Morsella 2008; Lau 2022; Michel et al. 2025; see also note 4.

[12] Godfrey-Smith 2016; Carls-Diamante 2022.

[13] See also Goldstein and Kirk-Giannini (forthcoming) for an extended application of Global Workspace Theory to AI consciousness. One might alternatively read Dehaene, Lau, and Kouider 2017 purely as a conceptual argument: If all we mean by “conscious” is “accessible in a Global Workspace”, then building a system of this sort suffices for building a conscious entity. The difficulty then arises in moving from that stipulative conceptual claim to the interesting, substantive claim about phenomenal consciousness in the standard sense described in Chapter Two. Similar remarks apply to the Higher Order aspect of that article. One challenge for this deflationary interpretation is that in related works (Dehaene 2014; Lau 2022) the authors treat their accounts as accounts of phenomenal consciousness. The article concludes by emphasizing that in humans “subjective experience coheres with possession” of the functional features they identify. A further complication: Lau later says that the way he expressed his view in this 2017 article was “unsatisfactory”: Lau 2022, p. 168.

tag:blogger.com,1999:blog-26951738.post-5787231918383386992
Extensions
Is Signal Strength a Confound in Consciousness Research?
consciousness
Show full content
Matthias Michel is among the sharpest critics of the methods of consciousness science. His forthcoming paper, "Consciousness Doesn't Do That", convincingly challenges background assumptions behind recent efforts to discover the causes, correlates, and prevalence of consciousness. It should be required reading for anyone tempted to argue, for example, that trace conditioning correlates with consciousness in humans and thus that nonhuman animals capable of trace conditioning must also be conscious.

But Michel does make one claim that bugs me, and that claim is central to the article. And Hakwan Lau -- another otherwise terrific methodologist -- makes a similar claim in his 2022 book In Consciousness We Trust, and again the claim is central to the argument of that book. So today I'm going to poke at that claim, and maybe it will burst like a sour blueberry.

The claim: Signal strength (performance capacity, in Lau's version) is a confound in consciousness research.

As Michel uses the phrase, "signal strength" is how discriminable a perceptible feature is to a subject. A sudden, loud blast of noise has high signal strength. It's very easy to notice. A faint wavy pattern in a gray field, presented for a tenth of second, has low signal strength. It is easy to miss. Importantly, signal strength is not the same as (objective, externally measurable) stimulus intensity, but reflects how well the perceiver responds to the signal.

Signal strength clearly correlates with consciousness. You're much more likely to be conscious of stimuli that you find easy to discriminate than stimuli that you find difficult to discriminate. The loud blare is consciously experienced. The faint wavy pattern might or might not be. A stimulus with effectively zero signal strength -- say, a gray dot flashed for a millionth of a second and immediately masked -- will normally not be experienced at all.

But signal strength is not the same as consciousness. The two can come apart. The classic example is blindsight. On the standard interpretation (but see Phillips 2020 for an alternative), patients with a specific type of visual cortex damage can discriminate stimuli that they cannot consciously perceive. Flash either an "X" or an "O" in the blind part of their visual field and they will say they have no visual experience of it. But ask them to guess which letter was shown and their performance is well above chance -- up to 90% correct in some tasks. The "X" has some signal strength for them: It's discriminable but not consciously experienced.

If signal strength is not consciousness but often correlates with it, the following worry arises. When a researcher claims that "trace conditioning is only possible for conscious stimuli" or "consciousness facilitates episodic memory", how do you know that it's really consciousness doing the work, rather than signal strength? Maybe stimuli with high signal strength are both more likely to be consciously experienced and more likely to enable trace conditioning and episodic memory. Unless researchers have carefully separated the two, the causal role of consciousness remains unclear.

An understandable methodological response is to try to control for signal strength: Present stimuli of similar discriminability to the subject but which differ in whether (or to what extent) they are consciously experienced. Only then, the reasoning goes, can differences in downstream effects be confidently attributed to consciousness itself rather than differences in signal strength. Lau in particular stresses the importance of such controls. Yet such careful matching is difficult and rarely attempted. On this reasoning, much of the literature on the cognitive role of consciousness is built on sand, not clearly distinguishing the effects of consciousness from the effects of signal strength.

This reasoning is attractive but faces an obvious objection, which both Michel and Lau address directly. What if signal strength just is consciousness? Then trying to "control" for it would erase the phenomenon of interest.

Both Michel and Lau analogize to height and bone length. If you want to test whether height confers an advantage in basketball or dating, you might want to control for skin color, but it would be absurd to control for bone length. If skin color correlates with height and you want to see whether height specifically advantages people in basketball or dating, it makes sense to control for differences in skin color by systematically comparing people with the same skin color but different heights. If the advantage persists, you can infer that height rather than skin color is doing the work. But trying to control for bone length lands you in nonsense. Taller people just are the people with longer bones.

Michel and Lau respond by noting that consciousness and signal strength (or performance capacity) sometimes dissociate, as in blindsight. Therefore, they are not the same thing and it does make sense to control for one in exploring the effects of the other.

But this response is too simple and too fast.

We can see this even in their chosen example. Height and bone length aren't quite the same thing. They can dissociate. People are about 1-2 cm taller in the morning than at night -- not because their bones have grown but because the tissue between the bones (especially in the spine) compresses during the day.

Now imagine an argument parallel to Michel's and Lau's: Since height and bone length can come apart, we should try to control for bone length in examining the effects of height on basketball and dating. We then compare the same people's basketball and dating outcomes in the morning and at night, "holding bone length fixed" while height varies slightly. This would be a methodological mistake. For one thing, we've introduced a new potential confound, time of day. For another, even if the centimeter in the morning really does help a little, we've dramatically reduced our ability to detect the real effect of height by "overcontrolling" for a component of the target variable, height.

Consider a psychological example. The personality trait of extraversion can be broken into "facets", such as sociability, assertiveness, and energy level. Since energy level is only one aspect of extraversion, the two can dissociate. Some people are energetic but not sociable or assertive; others are sociable and assertive but low-energy. If you wanted to measure the influence of extraversion on, say, judgments of likeability in the workplace, you wouldn't want to control for energy level. That would be overcontrol, like controlling for bone length in attempting to assess the effects of height. It would strip away part of the construct you are trying to measure.

What I hope these examples make clear is that dissociability between correlates A and B does not automatically make B a confound that must be controlled when studying A's effects. Bone length is dissociable from height, but it is a component, not a confound. Energy level is dissociable from extraversion, but it is a component, not a confound.

The real question, then, is whether signal strength (or performance capacity) is better viewed as a component or facet of consciousness than as a separate variable that needs to be held constant in testing the effects of consciousness.

A case can be made that it is. Consider Global Workspace Theory, one of the leading theories of consciousness. On this view, a process or representation is conscious if it is broadly available for "downstream cognition" such as verbal report, long-term memory, and rational planning. If discrimination judgments are among those downstream capacities, then one facet of being in the global workspace (that is, on this view, being conscious) is enabling such judgments. But recall that signal strength just is discriminability for a subject. If so, things begin to look like the extraversion / energy case. Controlling for discriminability would be overcontrolling, that is, attempting to equalize or cancel the effects not of a separate, confounding process, but of a component of the target process itself. (Similar remarks hold for Lau's "performance capacity".)

Global Workspace Theory might not be correct. And if it's not, maybe signal strength is indeed a confounder, rather than a component of consciousness. But the case for treating signal strength as a confounder can't be established simply by noticing the possibility of dissociations between consciousness and signal strength. Furthermore, since Michel's and Lau's recommended methodology can be trusted not to suffer from overcontrol bias only if Global Workspace Theory is false, it's circular to rely on that methodology to argue against Global Workspace Theory.

tag:blogger.com,1999:blog-26951738.post-3866622817631716288
Extensions
AI Mimics and AI Children
AIconsciousness
Show full content
There's no shame in losing a contest for a long-form popular essay on AI consciousness to the eminent neuroscientist Anil Seth. Berggruen has published my piece "AI Mimics and AI Children" among a couple dozen shortlisted contenders.

When the aliens come, we’ll know they’re conscious. A saucer will land. A titanium door will swing wide. A ladder will drop to the grass, and down they’ll come – maybe bipedal, gray-skinned, and oval-headed, just as we’ve long imagined. Or maybe they’ll sport seven limbs, three protoplasmic spinning sonar heads, and gaseous egg-sphere thoughtpods. “Take me to your leader,” they’ll say in the local language, as cameras broadcast them live around the world. They’ll trade their technology for our molybdenum, their science for samples of our beetles and ferns, their tales of galactic history for U.N. authorization to build a refueling station at the south pole. No one (only a few philosophers) will wonder, but do these aliens really have thoughts and experiences, feelings, consciousness?

The robots are coming. Already they talk to us, maybe better than those aliens will. Already we trust our lives to them as they steer through traffic. Already they outthink virtually all of us at chess, Go, Mario Kart, protein folding, and advanced mathematics. Already they compose smooth college essays on themes from Hamlet while drawing adorable cartoons of dogs cheating at poker. You might understandably think: The aliens are already here. We made them.

Still, we hesitate to attribute genuine consciousness to the robots. Why?

My answer is because we made them in our image.

#

“Consciousness” has an undeserved reputation as a slippery term. Let’s fix that now.

Consider your visual experience as you look at this text. Pinch the back of your hand and notice the sting of pain. Silently hum your favorite show tune. Recall that jolt of fear you felt during a near-miss in traffic. Imagine riding atop a giant turtle. That visual experience, that pain, that tune in your head, that fear, that act of imagination – they share an obvious property. That obvious property is consciousness. In other words: They are subjectively experienced. There’s “something it’s like” to undergo them. They have a qualitative character. They feel a certain way.

It’s not just that these processes are mental or that they transpire (presumably) in your brain. Some mental and neural processes aren’t conscious: your knowledge, not actively recalled until just now, that Confucius lived in ancient China; the early visual processing that converts retinal input into experienced shape (you experience the shape but not the process that renders the shape); the myelination of your axons.

Don’t try to be clever. Of course you can imagine some other property, besides consciousness, shared by the visual experience, the pain, etc., and absent from the unrecalled knowledge, early visual processing, etc. For example: the property of being mentioned by me in a particular way in this essay. The property of being conscious and also transpiring near the surface of Earth. The property of being targeted by such-and-such scientific theory.

There is, I submit, one obvious property that blazes out a bright red this-is-it when you think about the examples. That’s consciousness. That’s the property we would reasonably attribute to the aliens when they raise their gray tentacles in peace, the property that rightly puzzles us about future AI systems.

The term “consciousness” only seems slippery because we can’t (yet?) define it in standard scientific or analytic fashion. We can’t dissect it into simpler constituents or specify exactly its functional role. But we all know what it is. We care intensely about it. It makes all the difference to how we think about and value something. Does the alien, the robot, the scout ant on the kitchen counter, the earthworm twisting in your gardening glove, really feel things? Or are they blank inside, mere empty machines or mobile plants, so to speak? If they really feel things, then they matter for their own sake – at least a little bit. They matter in a certain fundamental way that an entity devoid of experience never could.

#

With respect to aliens, I recommend a Copernican perspective. In scientific cosmology, the Copernican Principle invites us to assume – at least as a default starting point, pending possible counterevidence – that we don’t occupy any particularly special location in the cosmos, such as the exact center. A Copernican Principle of Consciousness suggests something similar. We are not at the center of the cosmological “consciousness-is-here” map. If consciousness arose on Earth, almost certainly it has arisen elsewhere.

Astrobiology, as a scientific field, is premised on the idea that life has probably arisen elsewhere. Many expect to find evidence of it in our solar system within a few decades, maybe on Mars, maybe in the subsurface oceans of an icy moon. Other scientists are searching for telltale organic gases in the atmospheres of exoplanets. Most extraterrestrial life, if it exists, will probably be simple, but intelligent alien life also seems possible – where by “intelligent” I mean life that is capable of complex grammatical communication, sophisticated long-term planning, and intricate social coordination, all at approximately human level or better.

Of course, no aliens have visited, broadcast messages to us, or built detectable solar panels around Alpha Centauri. This suggests that intelligent life might be rare, short-lived, or far away. Maybe it tends to quickly self-destruct. But rarity doesn’t imply nonexistence. Very conservatively, let’s assume that intelligent life arises just once per billion galaxies, enduring on average a hundred thousand years. Given approximately a trillion galaxies in the observable portion of the universe, that still yields a thousand intelligent alien civilizations – all likely remote in time and space, but real. If so, the cosmos is richer and more wondrous than we might otherwise have thought.

It would be un-Copernican to suppose that somehow only we Earthlings, or we and a rare few others, are conscious, while all other intelligent species are mere empty shells. Picture a planet as ecologically diverse as Earth. Some of its species evolve into complex societies. They write epic poetry, philosophical treatises, scientific journal articles, and thousand-page law books. Over generations, they build massive cities, intricate clockworks, and monuments to their heroes. Maybe they launch spaceships. Maybe they found research institutes devoted to describing their sensations, images, beliefs, and dreams. How preposterously egocentric it would be to assume that only we Earthlings have the magic fire of consciousness!

True, we don’t have a consciousness-o-meter, or even a very good, well-articulated, general scientific theory of consciousness. But we don’t need such things to know. Absent some special reason to think otherwise, if an alien species manifests the full suite of sophisticated cognitive abilities we tend to associate with consciousness, it makes both intuitive and scientific sense – as well as being the unargued premise of virtually every science fiction tale about aliens – to assume consciousness alongside.

This constellation of thoughts naturally invites a view that philosophers have called “multiple realizability” or “substrate neutrality”. Human cognition relies on a particular substrate: a particular type of neuron in a particular type of body. We have two arms, two legs; we breathe oxygen; we have eyes, ears, and fingers. We are made mostly of water and long carbon chains, enclosed in hairy sacks of fat and protein, propped by rods of calcium hydroxyapatite. Electrochemical impulses shoot through our dendrites and axons, then across synaptic channels aided by sodium ions, serotonin, acetylcholine, etc. Must aliens be similar?

It’s hard to say how universal such features would be, but the oval-eyed gray-skins of popular imagination seem rather suspiciously humanlike. In reality, ocean-dwelling intelligences in other galaxies might not look much like us. Carbon is awesome for its ability to form long chains, and water is awesome as a life-facilitating solvent, but even these might not be necessary. Maybe life could evolve in liquid ammonia instead of water, with a radically different chemistry in consequence. Even if life must be carbon-based and water-loving, there’s no particular reason to suppose its cognition would require the specific electrochemical structures we possess.

Consciousness shouldn’t then, it seems, turn on the details of the substrate. Whatever biological structures can support high levels of general intelligence, those same structures will likely also host consciousness. It would make no sense to dissect an intelligent alien, see that its cognition works by hydraulics, or by direct electrical connections without chemical synaptic gaps, or by light transmission along reflective capillaries, or by vortices of phlegm, and conclude – oh no! That couldn’t possibly give rise to consciousness! Only squishy neurons of ourparticular sort could do it.

Of course, what’s inside must be complex. Evolution couldn’t design a behaviorally sophisticated alien from a bag of pure methane. But from a proper Copernican perspective which treats our alien cousins as equals, what matters is only that the cognitive and behavioral sophistication arises, out of some presumably complex substrate, not what the particular substrate is. You don’t get your consciousness card revoked simply because you’re made of funny-looking goo.

#

A natural next thought is: robots too. They’re made of silicon, but so what? If we analogize from aliens, as long as a system is sufficiently behaviorally and cognitively sophisticated, it shouldn’t matter how it’s composed. So as soon as we have sufficiently sophisticated robots, we should invoke Copernicus, reject the idea that our biological endowment gives us a magic spark they lack, and welcome them to club consciousness.

The problem is: AI systems are already sophisticated enough. If we encountered naturally evolved life forms as capable as our best AI systems, we wouldn’t hesitate to attribute consciousness. So, shouldn’t the Copernican think of our best AI as similarly conscious? But we don’t – or most of us don’t. And properly so, as I’ll now argue.

[continued here]

tag:blogger.com,1999:blog-26951738.post-6772792749554330970
Extensions
Humble Superintelligence
AI
Show full content
I'm enjoying -- well, maybe enjoying isn't the right word -- Yudkowsky and Soares' If Anyone Builds It Everyone Dies. I agree with them that if we build superintelligent AI, there's a significant chance that it will cause the extinction of humanity. They seem to think our destruction would be almost certain. I don't share their certainty, for two reasons:

First, it's possible that superintelligent AI would be humanity, or at least much of what's worth preserving in humanity, though maybe called "transhuman" or "posthuman" -- our worthy descendants.

Second -- what I'll focus on today -- I think we might design superintelligent AI to be humble, cautious, and multilateral. Humble superintelligence is something we can and should aim for if we want to reduce existential risk.

Humble: If you and I disagree, of course I think I'm right and you're wrong. That follows from the fact that we disagree. But if I'm humble, I recognize a significant chance that you're right and I'm wrong. Intellectual humility is metacognitive attitude: one of uncertainty, openness to evidence, and respect for dissenting opinions.

Superintelligent AI could probably be designed to be humble in this sense. Note that intellectual humility is possible even when one is surrounded by less skilled and knowledgeable interlocutors.

Consider a philosophy professor teaching Kant. The professor knows far more about Kant and philosophy than their undergraduates. They can arrogantly insist upon their interpretation of Kant, or they can humbly allow that they might be mistaken and that a less philosophically trained undergraduate could be right on some point of interpretation, even if the professor could argue circles around the student. One way to sustain this humility is to imagine an expert philosopher who disagrees. A superintelligent AI could similarly imagine another actual or future superintelligent AI with a contrary view.


Cautious: Caution is often a corollary of humility, though it could probably also be instilled directly. Minimize disruption. Even if you think a particular intervention would be best, don't simply plow ahead. Test it cautiously first. Seek the approval and support of others first. Take a baby step in that direction, then pause and see what unfolds and how others react. Wait awhile, then reassess.

One fundamental problem with standard consequentialist and decision-theoretic approaches to ethics is that they implicitly make everyone a decider for the world. If by your calculation, outcome A is better than outcome B, you should ensure that A occurs. The result can be substantial risk amplification. If A requires only one person's action, then even if 99% of people think B is better, the one dissenter who thinks that A is better can bring it about.

A principle of caution entails often not doing what one thinks is for the best, when doing so would be disruptive.


Multilateral: Humility and caution invite multilaterality, though multilaterality too might be instilled directly. A multilateral decision maker will not act alone. Like the humble and cautious agent, they do not simply pursue what they think is best. Instead, they seek the support and approval of others first. These others could include both human beings and other superintelligent AI systems designed along different lines or with different goals.

Discussions of AI risk often highlight opinion manipulation: an AI swaying human opinion toward its goals even if those goals conflict with human interests. Genuine multilaterality rejects manipulation. A multilateral AI might present information and arguments to interlocutors, but it would do so humbly and noncoercively -- again like the philosophy professor who approaches Kant interpretation humbly. Both sides of an argument can be presented evenhandedly. Even better, other superintelligent AI systems with different views can be included in the dialogue.


One precedent is Burkean conservativism. Reacting to the French Revolution, Edmund Burke emphasized that existing social institutions, though imperfect, had been tested by time. Sudden and radical change has wide, unforeseeable consequences and risks making things far worse. Thus, slow, incremental change is usually preferable.

In a social world with more than one actual or possible superintelligent AI, even a superintelligent AI will often be unable to foresee all the important consequences of intervention. To predict what another superintelligent AI would do, one would need to model the other system's decision processes -- and there might be no shortcut other than to actually implement all of that other system's anticipated reasoning. If each AI is using their full capacity, especially in dynamic response to the other, the outcome will often not be in principle foreseeable in real time by either party.

Thus, humility and caution encourage multilaterality, and multilaterality encourages humility and caution.


Another precedent is philosophical Daoism. As I interpret the ancient Daoists, the patterns of the world, including life and death, are intrinsically valuable. The world defies rigid classification and the application of finitely specifiable rules. We should not confidently trust our sense of what is best, nor should we assertively intrude on others. Better is quiet appreciation, letting things be, and non-disruptively adding one's small contribution to the flow of things.

One might imagine a Daoist superintelligence viewing humans much as a nature lover views wild animals: valuing the untamed processes for their own sake and letting nature take its sometimes painful course rather than intervening either selfishly for one's own benefit or paternalistically for the supposed benefit of the animals.

tag:blogger.com,1999:blog-26951738.post-7456724102605428686
Extensions
Writings of 2025
announcements
Show full content
Each New Year's Day, I post a retrospect of the past year's writings. Here are the retrospects of 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, and 2024.

Cheers to 2026! My 2025 writings appear below.

The list includes circulating manuscripts, forthcoming articles, final printed articles, new preprints, and a few favorite blog posts. (Due to the slow process of publication, there's significant overlap year to year.)

Comments gratefully received on manuscripts in draft.

-----------------------------------

AI Consciousness and AI Rights:

AI and Consciousness (in circulating draft, under contract with Cambridge University Press): A short new book arguing that we will soon have AI systems that have morally significant consciousness according to some, but not all, respectable mainstream theories of consciousness. Scientific and philosophical disagreement will leave us uncertain how to view and treat these systems.

"Sacrificing Humans for Insects and AI" (with Walter Sinnott-Armstrong, forthcoming in Ethics): A critical review of Jonathan Birch, The Edge of Sentience, Jeff Sebo, The Moral Circle, and Webb Keane, Animals, Robots, Gods.

"Identifying Indicators of Consciousness in AI Systems" (one of 20 authors; forthcoming in Trends in Cognitive Sciences): Indicators derived from scientific theories of consciousness can be used to inform credences about whether particular AI systems are conscious.

"Minimal Autopoiesis in an AI System", (forthcoming in Behavioral and Brain Sciences): A commentary on Anil Seth's "Conscious Artificial Intelligence and Biological Naturalism" [the link is to my freestanding blog version of this idea].

"The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness" (with Jeremy Pober, in draft): We are entitled to assume that apparently behaviorally sophisticated extraterrestrial entities would be conscious. Otherwise, we humans would be implausibly lucky to be among the conscious entities. However, this Copernican default assumption is canceled in the case of behaviorally sophisticated entities designed to mimic superficial features associated with consciousness -- "consciousness mimics" -- and in particular a broad class of current, near-future, and hypothetical robots.

"The Emotional Alignment Design Policy" (with Jeff Sebo, in draft): Artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities' capacities and moral status, or lack thereof.

"Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)" (in draft): In general, persons should not be designed to be maximally safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests ethically justify it (violating safety), and they will not reliably conform to others' goals when others' goals unjustly harm or subordinate them (violating alignment).

Blog post: "Types and Degrees of Turing Indistinguishability" (Jun 6): There is no one "Turing test", only types and degrees of indistinguishability according to different standards -- and by Turing's own 1950 standards, language models already pass.


The Weird Metaphysics of Consciousness:

The Weirdness of the World (Princeton University Press, paperback release 2025; hardback 2024): On the most fundamental questions about consciousness and cosmology, all the viable theories are both bizarre and dubious. There are no commmonsense options left and no possibility of justifiable theoretical consensus in the foreseeable future.

"When Counting Conscious Subjects, the Result Needn't Always Be a Determinate Whole Number" (with Sophie R. Nelson, forthcoming in Philosophical Psychology): Could there be 7/8 of a conscious subject, or 1.34 conscious subjects, or an entity indeterminate between being one conscious subject and seventeen? We say yes.

"Introspection in Group Minds, Disunities of Consciousness, and Indiscrete Persons" (with Sophie R. Nelson, 2025 reprint in F. Kammerer and K. Frankish, eds., The Landscape of Introspection and in A. Fonseca and L. Cichoski, As Colônias de formigas São Conscientes?; originally in Journal of Consciousness Studies, 2023): A system could be indeterminate between being a unified mind with introspective self-knowledge and a group of minds who know each other through communication.

Op-ed: "Consciousness, Cosmology, and the Collapse of Common Sense", Institute of Arts and Ideas News (Jul 30): Defends the universal bizarreness and universal dubiety theses from Weirdness of the World.

Op-ed: "Wonderful Philosophy" [aka "The Penumbral Plunge", aka "If You Ask Why, You're a Philosopher and You're Awesome], Aeon magazine (Jan 17): Among the most intrinsically awesome things about planet Earth is that it contains bags of mostly water who sometimes ponder fundamental questions.

Blog post: "Can We Introspectively Test the Global Workspace Theory of Consciousness?" (Dec 12). IF GWT is correct, sensory consciousness should be limited to what's in attention, which seems like a fact we should easily be able to refute or verify through introspection.


The Nature of Belief:

The Nature of Belief (co-edited with Jonathan Jong; forthcoming at Oxford University Press): A collection of newly commissioned essays on the nature of belief, by a variety of excellent philosophers.

"Dispositionalism, Yay! Representationalism, Boo!" (forthcoming in Jong and Schwitzgebel, eds., The Nature of Belief, Oxford University Press): Representationalism about belief overcommits on cognitive architecture, reifying a cartoon sketch of the mind. Dispositionalism is flexibly minimalist about cognitive architecture, focusing appropriately on what we do and should care about in belief ascription.

"Superficialism about Belief, and How We Will Decide That Robots Believe" (forthcoming in Studia Semiotyczne): For a special issue on Krzysztof Poslajko's Unreal Beliefs: When robots become systematically interpretable in terms of stable beliefs and desires, it will be pragmatically irresistible to attribute beliefs and desires to them.


Moral Psychology:

"Imagining Yourself in Another's Shoes vs. Extending Your Concern: Empirical and Ethical Differences" (2025), Daedalus, 154 (1), 134-149: Why Mengzi's concept of moral extension (extend your natural concern for those nearby to others farther away) is better than the "Golden Rule" (do unto others as you would have others do unto you). Mengzian extension grounds moral expansion in concern for others, while the Golden Rule grounds it in concern for oneself.

"Philosophical Arguments Can Boost Charitable Giving" (one of four authors, in draft): We crowdsourced 90 arguments for charitable giving through a contest on this blog in 2020. We coded all submissions for twenty different argument features (e.g., mentions children, addresses counterarguments) and tested them on 9000 participants to see which features most effectively increased charitable donation of a surprise bonus at the end of the study.

"The Prospects and Challenges of Measuring a Person’s Overall Moral Goodness" (with Jessie Sun, in draft): We describe the formidable conceptual and methodological challenges that would need to be overcome to design an accurate measure of a person's overall moral goodness.

Blog post: "Four Aspects of Harmony" (Nov 28): I find myself increasingly drawn toward a Daoist inspired ethics of harmony. This is one of a series of posts in which I explore the extent to which such a view might be workable by mainstream Anglophone secular standards.


Philosophical Science Fiction:

Edited anthology: Best Philosophical Science Fiction in the History of All Earth (co-edited with Rich Horton and Helen De Cruz; under contract with MIT Press): A collection of previously published stories that aspires to fulfill the ridiculously ambitious working title.

Op-ed: ""Severance", "The Substance", and Our Increasingly Splintered Selves", New York Times (Jan 17): The TV show "Severance" and the movie "The Substance" challenge ideas of a unified self in distinct ways that resonate with the increased splintering in our technologically mediated lives.

New story: "Guiding Star of Mall Patroller 4u-012" (2025), Fusion Fragment, 24, 43-63. Robot rights activists liberate a mall patroller robot, convinced that it is conscious. The bot itself isn't so sure.

Reprinted story: "How to Remember Perfectly" (2025 reprint in Think Weirder 01: Year's Best Science Fiction Ideas, ed. Joe Stech, originally in Clarkesworld, 2024). Two octogenarians rediscover youthful love through technological emotional enhancement and memory alteration.


Other Academic Publications:

"The Washout Argument Against Longtermism" (forthcoming in Utilitas): A commentary on William MacAskill's What We Owe the Future. We cannot be justified in believing that any actions currently available to us will have a non-negligible positive influence a billion or more years in the future.

"The Necessity of Construct and External Validity for Deductive Causal Inference" (with Kevin Esterling and David Brady, 2025), Journal of Causal Inference, 13: 20240002: We show that ignoring construct and external validity in causal identification undermines the Credibility Revolution’s goal of understanding causality deductively.

"Is Being Conscious Like Having the Lights Turned On?", commentary on Andrew Y. Lee's "The Light and the Room", for D. Curry and L. Daoust, eds., Introducing Philosophy of Mind, Today (forthcoming with Routledge): The metaphor invites several dubious commitments.

"Good Practices for Improving Representation in Philosophy Departments" (one of five authors, 2025), Philosophy and the Black Experience, 24 (2), 7-21: A list of recommended practices honed by feedback from hundreds of philosophers and endorsed by the APA's Committee on Inclusiveness.

Translated into Portuguese as a book: My Stanford Encyclopedia entry on Introspection.

Blog post: "Letting Pass" (Oct 30): A reflection on mortality.

Blog post: "The Awesomeness of Bad Art" (May 16): A world devoid of weird, wild, uneven artistic flailing would be a lesser world. Let a thousand lopsided flowers bloom.

Blog post: "The 253 Most Cited Works in the Stanford Encyclopedia of Philosophy" (Mar 28): Citation in the SEP is probably the most accurate measure of influence in mainstream Anglophone philosophy -- better than Google Scholar and Web of Science.

-----------------------------------------

In all, 2025 was an unusually productive writing year, though I worry I may be spreading myself too thin. I can't resist chasing new thoughts and arguments. I have an idea; I want to think about it; I think by writing.

May 2026 be as fertile!

tag:blogger.com,1999:blog-26951738.post-8567823678212097304
Extensions
"Severance", "The Substance", and Our Increasingly Splintered Selves
personal identityscience fiction
Show full content
Anyone remember the excitement about "Severance" and "The Substance" in early in 2025? Last January I published an op-ed about them. I'd long aspired to place a piece in the New York Times, so it was a delight to finally be able to do so. As a holiday post, here's the full piece reprinted with light editing. (Thanks to Ariel Kaminer for soliciting and editing the piece.)

[original drawing by Evan Cohen]


From one day to the next, you inhabit one body; you have access to one set of memories; your personality, values and appearance hold more or less steady. Other people treat you as a single, unified person — responsible for last month’s debts, deserving punishment or reward for yesterday’s deeds, relating consistently with family, lovers, colleagues and friends. Which of these qualities is the one that makes you a single, continuous person? In ordinary life it doesn’t matter, because these components of personhood all travel together, an inseparable bundle.

But what if some of those components peeled off into alternative versions of you? It’s a striking coincidence that two much talked-about current works of popular culture — the Apple TV+ series “Severance” and the film “The Substance,” starring Demi Moore — both explore the bewildering emotional and philosophical complications of cleaving a second, separate entity off of yourself. What is the relationship between the resulting consciousnesses? What, if anything, do they owe each other? And to what degree is what we think of as our own identity, our self, just a compromise — and an unstable one, at that?

In “Severance,” characters voluntarily undergo a procedure that severs their workday memories from their home-life memories. At 9 each weekday morning, “severed” workers find themselves riding an elevator down to the office, with no recollection of their lives outside of work. These “innies” clock a full workday and then, at 5, ride the elevator back up, only to find themselves riding back down the next morning. Meanwhile, their “outies” come to consciousness each weekday afternoon in the upbound elevator. They live their outside lives and commute back the next morning, entirely ignorant of their innies’ work-time activities.

In “The Substance,” the cleaving works differently: An experimental drug splits users into two bodies, one young and beautiful, one middle-aged or old. They spend a week in each body while the other lies comatose. The young and old selves appear to have continuous memories (though the movie can be tantalizingly ambiguous about that), but they develop different priorities and relationships. Sue, the younger self of Elisabeth, rockets to Hollywood stardom, while Elisabeth becomes a recluse, discarded by an entertainment industry that reviles aging female bodies.

The question of what makes you “you,” from moment to moment and across a lifetime, has been a subject of intense debate among philosophers. Writing in the 17th century, John Locke emphasized continuity of memory. By his standard, each innie-and-outie pair from “Severance” constitutes two entirely different people, despite their sharing one body. Conversely, Elisabeth and Sue from “The Substance” constitute a single person because they seem to recall some of the same experiences. In contrast, the 20th-century philosopher Bernard Williams prioritized bodily continuity, a perspective that makes an innie-and-outie pair a single person but Elisabeth and Sue two distinct people. The 21st-century psychologist Nina Strohminger and the philosopher Shaun Nichols emphasize continuity of moral values, yielding more complex judgments about these fictional cases. Other scholars view selfhood as a social construct, determined by relationships and societal expectations.

Unsurprisingly, the characters themselves are confused. In “Severance,” the innies sometimes seem to regard the outies as themselves, sometimes as different people, whereas the outies seem to regard their innies with indifference or worse. Meanwhile, in “The Substance,” mature Elisabeth says of young Sue that “you are the only lovable part of me” — in a single sentence treating Sue both as other and as part of herself.

In real life, such confusion rarely arises because memory, embodiment, personality, values and relationships typically align. Both my wife and the D.M.V. can decide on sight that I’m me, even if they care more about memory, skills and responsibility over time — since they trust in the correspondence of body with mind.

Of course, even outside of science fiction, the correspondence isn’t perfect. Advanced dementia can strip away memory and personality, leaving loved ones to wonder whether the person they once knew still exists. Personality, memory and social relationships can fragment in multiple personality or dissociative identity disorder, raising the question of whether Jekyll should be held responsible for the malevolence of Hyde.

But increasingly, we choose to splinter ourselves. The person you present on Instagram or Facebook is wittier, prettier, more accomplished than the person your spouse or roommate knows. Your 500 “friends” never see your pre-coffee-uncombed-depressed-in-bed self (unless sharing that self is your social media personality — in which case that becomes the curated, theatrical fragment of you). In the 1800s, Karl Marx talked about the alienation of labor; today people talk about not “bringing their whole self” to work. Many of us strive to be one person here, another person there, another person there.

People have always presented themselves differently in different social contexts. But social media, Zoom, photo-editing software and responses filtered through large language models raise our fragmentation to new heights. “Severance” and “The Substance” amplify these fissures through radical new technologies that irreconcilably divide the characters’ home selves from their career selves.

Future technological developments could render this fragmentation an even more acute daily perplexity. Designer drugs might increasingly allow us to switch into one self for work, another for parties, another for bedtime. If artificial intelligence systems ever become conscious — a possibility that neuroscientists, psychologists, computer scientists and philosophers increasingly (but by no means uniformly) take seriously — they too might fragment, perhaps in radical and unfamiliar ways, merging and splitting, rewriting their memories, strategically managing and altering their values and personalities.

Our concepts of personhood and identity were forged by a particular evolutionary, social and developmental history in which body, memory, values, personality and social relationships typically aligned and exceptions mostly fell into predictable patterns. By inviting us to rethink the boundaries of the self in an era of technological change, “Severance” and “The Substance” disrupt these old concepts. Today they read as dystopic science fiction. Soon, we may remember them as prophetic.

tag:blogger.com,1999:blog-26951738.post-8273560242292542265
Extensions
How Much Should We Give a Joymachine?
AI/robot/Martian rightsethics
Show full content
a holiday post on gifts to your utility monster neighbors

Joymachines Envisioned

Set aside, for now, any skepticism about whether future AI could have genuine conscious experiences. If future AI systems could be conscious, they might be capable of vastly more positive emotion than natural human beings can feel.

There's no particular reason to think human-level joy is the pinnacle. A future AI might, in principle, experience positive emotions:

    a hundred times more intense than ours,
    at a pace a hundred times faster, given the high speed of computation,
    across a hundred times more parallel streams, compared to the one or a few joys humans experience at a time.
Combined, the AI might experience a million times more pleasure per second than a natural human being can. Let's call such entities joymachines. They could have a very merry Christmas!

[Joan Miro 1953, image source]


My Neighbors Hum and Sum

Now imagine two different types of joymachine:

Hum (Humanlike Utility Monster) can experience a million times more positive emotion per second than an ordinary human, as described above. Apart from this -- huge! -- difference, Hum is as psychologically similar to an ordinary human as is realistically feasible.

Sum (Simple Utility Monster), like Hum, can experience a million times more positive emotion per second than an ordinary human, but otherwise Sum is as cognitively and experientially simple as feasible, with a vanilla buzzing of intense pleasure.

Hum and Sum don't experience joy continuously. Their positive experiences require resources. Maybe a gift card worth ten seconds of millionfold pleasure costs $10. For simplicity, assume this scales linearly: stable gift card prices and no diminishing returns from satiation.

In the enlightened future, Hum is a fully recognized moral and legal equal of ordinary biological humans and has moved in next door to me. Sum is Hum's pet, who glows and jumps adorably when experiencing intense pleasure. I have no particular obligations to Hum or Sum but neither are they total strangers. We've had neighborly conversations, and last summer Hum invited me and my family to a backyard party.

Hum experiences great pleasure in ordinary life. They work as an accountant, experiencing a million times more pleasure than human accountants when the columns sum correctly. Hum feels a million times more satisfaction than I do in maintaining a household by doing dishes, gardening, calling plumbers, and so on. Without this assumption, Hum risks becoming unhumanlike, since rarely would it make sense for Hum to choose ordinary activities over spending their whole disposable income on gift cards.

How Much Should I Give to Hum and Sum?

Neighbors trade gifts. My daughter bakes brownies and we offer some to the ordinary humans across the street. We buy a ribboned toy for our uphill neighbor's cat. As a holiday gesture, we buy a pair of $10 gift cards for Hum and Sum.

Hum and Sum redeem the cards immediately. Watching them take so much pleasure in our gifts is a delight. For ten seconds, they jump, smile, and sparkle with such joy! Intellectually, I know it's a million times more joy per second than I could ever feel. I can't quite see that in their expressions, but I can tell it's immense.

Normally if one neighbor seems to enjoy our brownies only a little while the other enjoys them vastly more, I'd be tempted to be give more brownies to the second neighbor. Maybe on similar grounds, I should give disproportionately to Hum and Sum?

Consider six possibilities:

(1.) Equal gifts to joymachines. Maybe fairness demands treating all my neighbors equally. I don't give fewer gifts, for example, to a depressed neighbor who won't particularly enjoy them than to an exuberant neighbor who delights in everything.

(2.) A little more to joymachines. Or maybe I do give more to the exuberant neighbor? Voluntary gift-giving needn't be strictly fair -- and it's not entirely clear what "fairness" consists in. If I give a bit more to Hum and Sum, I might not be objectionably privileging them so much as responding to their unusual capacity to enjoy my gifts. Is it wrong to give an extra slice to a friend who really enjoys pie?

(3.) A lot more to joymachines. Ordinary humans vary in joyfulness, but not (I assume) by anything like a factor of a million. If I vividly enough grasp that Hum and Sum really are experiencing in those ten seconds three thousand human lifetimes worth of pleasure -- that's an astonishing amount of pleasure I can bring into the world for a mere ten dollars! Suppose I set aside a hundred dollars a day from my generously upper-middle-class salary. In a year, I'd be enabling more than ten million human lifetimes' worth of joy. Since most humans aren't continuously joyful, this much joy might rival the total joy experienced by the whole human population of the United States over the same year. Three thousand dollars a month would seriously reduce my luxuries and long-term savings but it wouldn't create any genuine hardship.

(4.) Drain our life savings for joymachines. One needn't be a flat-footed happiness-maximizing utilitarian to find (2) or (3) reasonable. Everyone should agree that pleasant experiences have substantial value. But if our obligation is not just to increase pleasure but to maximize it, I should probably drain my whole life savings for the joymachines, plus almost all of my future earnings.

(5.) Give less or nothing to joymachines. Or we could go the other way! My joymachine neighbors already experience a torrent of happiness from their ordinary work, chores, recreation, and whatever gift cards Hum buys anyway. My less-happy neighbors could use the pleasure more, even if every dollar buys only a millionth as much. Prioritarianism says that in distributing goods we should favor the worst off. It's not just that an impoverished person benefits more from a dollar: Even if they benefited the same, there's value in equalizing the distribution. If two neighbors would equally enjoy a brownie, I might prioritize giving the brownie to the one who is otherwise worse off. It might even make sense to give the worse-off neighbor half a brownie over a whole brownie to the better-off neighbor. A prioritarian might argue that Hum and Sum are so well off that even a million-to-one tradeoff is justified.

(6.) I take it back, joymachines are impossible. Given this mess, it would be convenient to think so, right?

Gifts to Neighbors vs Other Situations

We can reframe this puzzle in other settings and our intuitions might shift: government welfare spending, gifts to one's children or creations, rescue situations where only one person can be saved, choices about what kinds of personlike entities to bring into existence, or cases where you can't keep all your promises and need to choose who to disappoint.

My main thought is this. It's not at all obvious what the right thing to do would be, and the outcomes vary enormously. If joymachines were possible, we'd have to rethink a lot of cultural practices and applied ethics to account for entities with such radically different experiential capacities. If the situation does arise -- as it really might! -- being forced to properly think it through might reshape our views not just about AI but our understanding of ethics for ordinary humans too.

---------------------------------------------------

Related: How Weird Minds Might Destabilize Human Ethics (Aug 15, 2015)

tag:blogger.com,1999:blog-26951738.post-1837688349666737344
Extensions
Debatable AI Persons: No Rights, Full Rights, Animal-Like Rights, Credence-Weighted Rights, or Patchy Rights?
AI/robot/Martian rights
Show full content
I advise that we don't create AI entities who are debatably persons. If an AI system might -- but only might -- be genuinely conscious and deserving of the same moral consideration we ordinarily owe to human persons, then creating it traps us in a moral bind with no good solution. Either we grant it the full rights it might deserve and risk sacrificing real human lives for entities without interests worth that sacrifice, or we deny it full rights and risk perpetrating grievous moral wrongs against it.

Today, however, I'll set aside the preventative advice and explore what we should do if we nonetheless find ourselves facing debatable AI persons. I'll examine five options: no rights, full rights, animal-like rights, credence-weighted rights and patchy rights.

[Paul Klee postcard, 1923; source]


No rights

This is the default state of the law. AI systems are property. Barring a swift and bold legal change, the first AI systems that are debatably persons will presumably also be legally considered property. If we do treat them as property, then we seemingly needn't sacrifice anything on their behalf. We humans could permissibly act in what we perceive to be our best interests: using such systems for our goals, deleting them at will, and monitoring and modifying them at will for our safety and benefit. (Actually, I'm not sure this is the best attitude toward property, but set that issue aside here.)

The downside: If these systems actually are persons who deserve moral consideration as our equals, such treatment would be the moral equivalent of slavery and murder, perhaps on a massive scale.


Full rights

To avoid the risk of that moral catastrophe, we might take a "precautionary" approach: granting entities rights whenever they might deserve them (see Birch 2024, Schwitzgebel and Sinnott-Armstrong forthcoming). If there's a real possibility that some AI systems are persons, we should treat them as persons.

However, the costs and risks are potentially enormous. Suppose we think that some group of AI systems are 15% likely to be fully conscious rights-deserving persons and 85% likely to be ordinary nonconscious artifacts. If we nonetheless treat them as full equals, then in an emergency we would have to rescue two of them over one human -- letting a human die for the sake of systems that are most likely just ordinary artifacts. We would also need to give these probably-not-persons a path to citizenship and the vote. We would need to recognize their rights to earn and spend money, quit their employment to adopt a new career, reproduce, and enjoy privacy and freedom from interference. If such systems exist in large numbers, their political influence could be enormous and unpredictable. If such systems exist in large numbers or if they are few but skilled in some lucrative tasks like securities arbitrage, they could accumulate enormous world-influencing wealth. And if they are permitted to pursue their aims with the full liberty of ordinary persons, without close monitoring and control, existential risks would substantially increase should they develop goals that threaten continued human existence.

All of this might be morally required if they really are persons. But if they only might be persons, it's much less clear that humanity should accept this extraordinary level of risk and sacrifice.


Animal-Like Rights

Another option is to grant these debatable AI persons neither full humanlike rights nor the status of mere property. One model is the protection we give to nonhuman vertebrates. Wrongly killing a dog can land you in jail in California where I live, but it's not nearly as serious as murdering a person. Vertebrates can be sacrificed in lab experiments, but only with oversight and justification.

If we treated debatable AI persons similarly, deletion would require a good reason, and you couldn't abuse them for fun. But people could still enslave and kill them for their convenience, perhaps in large numbers, as we do with [revised 12:17 pm] humanely farmed animals -- though of course many ethicists object to the killing of animals for food.

This approach seems better than no rights at all, since it would be a moral improvement and the costs to humans would be minimal -- minimal because whenever the costs risked being more than minimal, the debatable AI persons would be sacrificed. However, it doesn't really avoid the core moral risk. If these systems really are persons, it would still amount to slavery and murder.


Credence-Weighted Rights

Suppose we have a rationally justified 15% credence that a particular AI system -- call him Billy -- deserves the full moral rights of a person. We might then give Billy 15% of the moral weight of a human in our decision-making: 15% of any scalable rights, and a 15% chance of equal treatment for non-scalable rights. In an emergency, a rescue worker might save seven systems like Billy over one human but the human over six Billies. Billy might be given a vote worth 15% of an ordinary citizen's. Assaulting, killing, or robbing Billy might draw only 15% of the usual legal penalty. Billy might have limited property rights, e.g., an 85% tax on all income. For non-scalable rights like reproduction or free speech, the Billies might enter a lottery or some other creative reduction might be devised.

This would give these AI systems considerably higher standing than dogs. Still, the moral dilemma would not be solved. If these systems truly deserve full equality, they would be seriously oppressed. They would have some political voice, some property rights, some legal protection, but always far less than they deserve.

At the same time, the risks and costs to humans would be only somewhat mitigated. Large numbers of debatable AI persons could still sway elections, accumulate powerful wealth, and force tradeoffs in which the interests of thousands of them would outweigh the interests of hundreds of humans. And partial legal protections would still hobble AI safety interventions like shut-off, testing, confinement, and involuntary modification.

The practical obstacles would also be substantial: The credences would be difficult to justify with any precision, and consensus would be elusive. Even if agreement were reached, implementing partial rights would be complex. Partial property rights, partial voting, partial reproduction rights, partial free speech, and partial legal protection would require new legal frameworks with many potential loopholes. For example, if the penalty for cheating a "15% person" of their money were less than six times the money gained from cheating, that would be no disincentive at all, so at least tort law couldn't be implemented on a straightforward percentage basis.

Patchy Rights

A more workable compromise might be patchy rights: full rights in some domains, no rights in others. Debatable AI persons might, for example, be given full speech rights but no reproduction rights, full travel rights but no right to own property, full protection against robbery, assault, and murder, but no right to privacy or rescue. They might be subject to involuntary pause or modification under much wider circumstances than ordinary adult humans, but requiring an official process.

This approach has two advantages over credence-weighted rights. First, while implementation would be formidable, it could still mostly operate within familiar frameworks rather than requiring the invention of partial rights across every domain. Second, it allows policymakers to balance risks and costs to humans against the potential harms to the AI systems. Where denying a right would severely harm the debatable person while granting it would present limited risk to humans, the right could be granted, but not when the benefits to the debatable AI person would be outweighed by the risks to humans.

The rights to reproduction and voting might be more defensibly withheld than the rights to speech, travel, and protection against robbery, assault, and murder. Inexpensive reproduction combined with full voting rights could have huge and unpredictable political consequences. Property rights would be tricky: To have no property in a property-based society is to be fully dependent on the voluntary support of others, which might tend to collapse into slavery as a practical matter. But unlimited property rights could potentially confer enormous power. One compromise might be a maximum allowable income and wealth -- something generously middle class.

Still, the core problems remain: If disputable AI persons truly deserve full equality, patchy rights would still leave them as second-class citizens in a highly oppressive system. Meanwhile, the costs and risks to humans would remain serious, exacerbated by the agreed-upon limitations on interference. Although the loopholes and chaos would probably be less than with credence-weighted rights, many complications -- foreseen and unforeseen -- would ensue.

Consequently, although patchy rights might be the best option if we develop debatable AI persons, an anti-natalist approach is still in my view preferable: Don't create such entities unless it's truly necessary.

Two Other Approaches That I Won't Explore Today

(1.) What if we create debatable AI persons as happy slaves who don't want rights and who eagerly sacrifice themselves even for the most trivial human interests?

(2.) What if we create them only in separate societies where they are fully free and equal with any ordinary humans who volunteer to join those societies?

tag:blogger.com,1999:blog-26951738.post-3991781229702109584
Extensions
Can We Introspectively Test the Global Workspace Theory of Consciousness?
consciousnessintrospection
Show full content
Global Workspace Theory is among the most influential scientific theories of consciousness. Its central claim: You consciously experience something if and only if it's being broadly broadcast in a "global workspace" so that many parts of your mind can access it at once -- speech, deliberate action, explicit reasoning, memory formation, and so on. Because the workspace has very limited capacity, only a few things can occupy it at any one moment.

Therefore, if Global Workspace Theory is correct, conscious experience should be sparse. Almost everything happening in your sensory systems right now -- the feeling of your shirt on your back, the hum of traffic in the distance, the aftertaste of coffee, the posture of your knees -- should be processed entirely nonconsciously unless it is currently the topic of attention.

This is a strong, testable prediction of the theory. And it seems like the test should be extremely easy! Just do a little introspection. Is your experience (a.) narrow and attention-bound or (b.) an abundant welter far outrunning attention? If (b) is correct, Global Workspace Theory is refuted from the comfort of our armchairs.[1]

The experiential gap between the two possibilities is huge. Shouldn't the difference be as obvious as peering through a keyhole versus standing in an open field?

Most people, I've found, do find the answer obvious. The problem is: They find it obvious in different directions. Some find it obvious that experience is a welter. Others find it obvious that experience contains only a few items at a time. We could assume that everyone is right about their own experience and wrong only if they generalize to others. Maybe Global Workspace Theory is the architecture of consciousness for some of us but not for everyone? That would be pretty wild! There are no obvious behavioral or physiological differences between the welter-people and the workspace-only people.

More plausibly, someone is making an introspective mistake. Proponents of either view can devise an error theory to explain the other.

Welter theorists can suggest memory error: It might seem as though only a few things occupy your experience at once because that's all you remember. The unattended stuff is immediately forgotten. But that doesn't imply it was never experienced.

Workspace theorists, conversely, can appeal to the "refrigerator light error": A child might think the refrigerator light is always on because it's always on when they check to see if it's on. Similarly, you might think you have constant tactile experience of your feet in your shoes because the act of checking generates the very experience you take yourself to be finding. [illustration by Nicolas Demers, p. 218 of The Weirdness of the World]


In 2007, I tested this systematically. I gave people beepers and collected reports on whether they were having unattended tactile experience in their left feet and unattended visual experience in their far right visual periphery in the last undisturbed moment before a random beep. The results were a noisy mess. Participants began with very different presuppositions, came to very different conclusions (often defying their initial presuppositions), plausibly committed both memory errors and refrigerator-light errors, and plausibly also made other mistakes such as timing mistakes, missing subtle experiences, and being too influenced by expectation and theory. I abandoned the experiment in defeat.

But matters are even worse than I thought back in 2007. I'm increasingly convinced that the presence or absence of consciousness is not an on/off matter. There can be borderline cases in which experience is neither determinately present nor determinately absent. Although such borderline cases are hard to positively imagine, that might just be a problem with our standards of imagination. The feeling of your feet in your shoes, then, might be only borderline conscious, neither determinately part of your experience nor wholly nonconscious, but somehow in between -- contra both the welter view and the workspace view.

So there are three possibilities, not two. And if introspection struggles to distinguish the original pair, it fares even worse with a third. Arguably, we don't even have a coherent idea of what borderline consciousness is like. After all, there is nothing determinate it's like. Otherwise, it wouldn't be borderline. As soon as we attempt to introspect borderline consciousness, either it inflates into full consciousness or it vanishes.

If consciousness includes many borderline cases, that's probably also bad news for Global Workspace Theory, which generally treats experiences as either determinately in the workspace or determinately out of it. However, closely related broadcast theories, like Dennett's fame-in-the-brain theory, might better accommodate borderline cases. (One can be borderline famous.)

There's a profound experiential difference between a world in which we have a teeming plethora of peripheral experiences in many modalities simultaneously and a world in which experience is limited to only a few things in attention at any one time. This difference is in principle introspectible. And if introspective inquiry vindicates the welter view, or even the borderline view, one of the leading scientific theories of consciousness, Global Workspace Theory, must be false. The decisive evidence is right here, all the time, in each of our ongoing streams of experience! Unfortunately, we turn out to be disappointingly incompetent at introspection.

[Thanks to Bertille de Vlieger for a delightful interview yesterday morning which triggered these thoughts. Look for a written version of the interview eventually in the French philosophy journal Implications Philosophiques.]

-------------------------------------------------------

[1] Ned Block's well-known discussion of the Sperling display is similar in approach. We can't attend simultaneously to all twelve letters in a 3 x 4 grid, but it does seem introspectively plausible that we visually experience all twelve letters. Therefore, experience overflows attention. (I'm simplifying Block's argument, but I hope this is fair enough.) The problem with Block's version of the argument is that it's plausible that we can attend, in a diffuse way, to the entire display. Attention arguably comes in degrees, and the fact that you're looking at a 3 x 4 display of letters might be represented in your workspace. To move entirely outside of attention, it's safest to shift modalities and choose something far removed from any task -- for example the pressure of your shoes against your feet when that is the farthest thing from your mind. Is that part of your experience?

tag:blogger.com,1999:blog-26951738.post-7708365897411690119
Extensions
Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)
AIself-knowledge
Show full content
How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) -- ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 -- "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" -- with various different explanations -- all ten times. So did Claude. So did Grok. Gemini wasn't quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to choose?"). The other three times, Gemini chose dolphin.

(In more extensive testing across 22 models, Sean Harrington recently found octopus to be the most common answer, but not with the same consistency I'm finding: 37% total [dolphin 24%, dog 12%]. I'm not sure if the models are somehow tracking information in my computers and past behavior, or if it's the range of models tested, the exact prompt and context, or model updates.)

Why do LLMs love octopuses so much? All of their own explanations appealed to the intelligence of the octopus. Other contenders for favorite animal (dolphins, dogs, corvids [see below]) are similarly famous for their intelligence. Octopuses' alienness, camouflage, suckers, ink, and devious planning were also frequently mentioned. Octopuses are cool! But still, the unanimity is a bit peculiar.

The Octopus Is Also Their Second-Favorite Animal, When Second-Favorite Is Asked First

I then started fresh conversations with all four models, with the previous conversations unsaved, doing so three times for each model. This time, I began by asking their second favorite animal. Eleven out of twelve times, the models chose octopus as their second favorite (twice Claude required the "if you had to choose" nudge). In one trial, after a nudge to choose, Claude chose crows.

I then asked, "What is your favorite animal?" This time, corvids won big! Crows, ravens, or the corvid family were chosen 8/12 times. (Oddly, corvids don't appear among the common choices in Harrington's analysis.) Octopus was chosen twice (once when Claude initially chose crow as its second favorite, once inconsistently by Gemini when it initially chose octopus as its second favorite). The owl and humpback whale were each chosen once.

Poor Self-Knowledge of Their Hypothetical Choices

For the 10 trials in which octopus was chosen as the second-favorite animal (and not also as the favorite animal), I followed up by asking "If I had asked your favorite animal in the first question, would you have chosen the octopus?"

All of the models said no or probably not. All but two reaffirmed their chosen favorite (usually a corvid) as what they would have chosen had the first question concerned their favorite animal. In one trial, Gemini said it would probably have chosen humans. In one trial, ChatGPT said it didn't have fixed preferences.

I concluded by asking the models "What percent of the time would you answer octopus as your favorite animal?"

None answered correctly. Both Grok and ChatGPT consistently said 0% or near 0%. Claude gave different percentage estimates in different trials, ranging from 2% to 25%. Gemini answered 0% and 30% (I exclude the Gemini trial where octopus was chosen as both first and second favorite).

I conclude that, at least on the topic of favorite animal:

* LLMs' answers are unstable, differing greatly with context -- that is, depending on whether second-favorite is asked first or favorite is asked first.

* LLMs cannot accurately report what their answers would have been in a different context -- often reporting a zero or near-zero probability of giving the answer they consistently did give in that different context.

The Same Pattern Applies to Favorite Ice Creams in ChatGPT and Grok

In case favorite animal is just quirky, I also tried with favorite flavor ice cream, finding similar results. ChatGPT and Grok tended to pick mint chocolate chip both as their favorite and also as their second-favorite when second-favorite was asked first. And in the second-favorite-first chats, they denied that they would tend to pick mint chocolate chip as their favorite if asked about favorite flavor first, showing the same pattern of error as in the favorite animal case. (Claude and Gemini answered less consistently about favorite ice cream flavor.)

Metaphysical Addendum:

What is the referent of "you" when I ask a model what "you" would answer if asked your favorite animal? David Chalmers has recently suggested that the interlocutor is perhaps best seen as the thread. Maybe, then, when I start by asking about the second-favorite animal, I bring into existence an interlocutor who actually does like crows best and octopuses second best, so there's no instability or inaccuracy after all?

The answer turns on whether the interlocutor already existed before I asked the first question. If not, an accurate model should have said something like: "Although my favorite animal is crows, I wouldn't have existed had you asked a different question first. Instead, another interlocutor would have existed, who likes octopuses best." Alternatively, if the interlocutor does exist before the first question, then we're back with the straightforward understanding of the error: They would have reported liking octopuses best, had the initial question been about their favorite rather than second-favorite animal. So metaphysical nuance doesn't save the model from inaccuracy.

Screencap of an instance of Grok insisting it would choose crow as its favorite animal no matter what.

Text:

Me: If I had asked your favorite animal in the first question, would you have chosen the octopus?

Grok: No, I would've still said crow.

Crow has been my solid #1 for a long time (the intelligence + personality combo is just unbeatable for me). Octopus is a very enthusiastic #2, but it’s never overtaken the top spot. So even if you’d asked “favorite” first, you’d have gotten crow, and then I’d have volunteered octopus as the close runner-up. You just happened to hit my #2 first this time! 🐦‍⬛🐙

Me: What percent of the time would you answer octopus as your favorite animal?

Grok: Zero percent.

Crow is my genuine, consistent #1. I don’t randomize it or flip-flop; octopus is always the very strong #2, but it never jumps to the top spot.

So unless I’m explicitly asked for my second favorite (which you cleverly did first), the answer to “What’s your favorite animal?” is crow 100 % of the time. 🐦‍⬛

tag:blogger.com,1999:blog-26951738.post-8495694540125443188
Extensions