a very occasional diary @ Nikita Danilov

What is cosh(List(Bool))? Or beyond algebra: analysis of data types.

nikita Apr 23, 2025 Updated Apr 23, 2025

Show full content

Taylor vs. Haskell, 1:0.

tag:blogger.com,1999:blog-5799246.post-1013716081605404825

Extensions

Review of Feminism, Interrupted: Disrupting Power by L. Olufemi

nikita Mar 9, 2025 Updated Apr 14, 2025

Show full content

Review of Feminism, Interrupted: Disrupting Power by L. Olufemi, Pluto Press, 2020

“feminism means freedom, it means the right to be […] incoherent “, p. 71

Let me state outright, that I won’t be able to provide a critique of the cogent rational argument that forms the core of Ms. Olufemi's book, for the simple fact that even the most diligent search will not find an argument of that sort there.

I am going to prove with ample internal and external evidence, that the book does not present an articulated argument for anything. That it is little more than a haphazard collection of claims, that are not only not supported by evidence, but do not even form a consistent sequence.

All references are to the paperback 2020 edition.

Preamble

The most striking feature of the book before us is that it is difficult to find a way to approach it. A sociological study and a political pamphlet still have something in common: they have a goal. The goal is to convince the reader. The methods of convincing can be quite different, involve data and logic and authority. But in any case, something is needed. As you start reading Feminism Interrupted you are bound to find that this something is hidden very well. Indeed, as I just counted, one of the first sections Who’s the boss has as many claims as it has sentences (some claims are in the form of rhetorical questions). There, as you see, was no space left for any form of supporting argumentation. Because it is not immediately obvious how to analyse a text with such a (non-)structure, let me start the same way as Ms. Olufemi starts in Introduction and just read along, jotting down notes and impressions, gradually finding our way through the book, forming a conception of the whole.

The following is a numbered list of places in the book that made me sigh or laugh. They are classified as: E (not supported by Evidence), C (Contradiction with itself or an earlier claim) and I (Incoherence). These classes are necessarily arbitrary and overlapping. I will also provide a commentary about some common themes and undercurrents running through the book.

One may object that this is too pedantic a way to review a book. Well, maybe it is, but this is the only way I know to lay ground for justifiable conclusions, with which my review will be concluded.

Notes on the text

1.E, p. 4: “neo-liberalism refers to the imposition of cultural and economic policies and practices by NGOs and governments in the last three to four decades that have resulted in the extraction and redistribution of public resources from the working class upwards “ — this of course has very little to do with the definition of Neo-liberalism. Trivial as it is, this sentence introduces one of the most interesting and unexpected themes that will resurface again and again: Ms. Olufemi, a self-anointed radical feminist from the extreme left of the political spectrum, in many respects is virtually indistinguishable from her brethren on the opposite side. Insidious “NGOs” machinating together with governments against common people are the staple imagery of the far-right.

2.C, p. 5: “… that feminism has a purpose beyond just highlighting the ways women are ‘discriminated’ against… It taught me that feminism’s task is to remedy the consequences of gendered oppression through organising… For me, ‘justice work’ involves reimagining the world we live in and working towards a liberated future for all… We refuse to remain silent about how our lives are limited by heterosexist, racist, capitalist patriarchy. We invest in a political education that seeks above all, to make injustice impossible to ignore. “ — With a characteristic ease, that we will appreciate to enjoy, Ms. Olufemi tells us that feminism is not words, and in the very next sentence, supports this by her refusal to remain silent.

3.C, p. 7: “Pop culture and mainstream narratives can democratise feminist theory, remove it from the realm of the academic and shine a light on important grassroots struggle, reminding us that feminism belongs to no one. “ —- Right after being schooled on how the iron fist of capitalist patriarchy controls every aspect of society, we suddenly learn that the capitalist society media welcomes the revolution.

4.C, p. 8: This is the first time that Ms. Olufemi has decided to cite a source (an article from Sp!ked, 2018). The reference is in the form of a footnote, and the footnote is a 70-character URL. That is what almost all her references and footnotes look like. A particularly gorgeous URL is in footnote 3 on p. 53: it’s 173 characters, of which the last 70 are unreadable gibberish. Am I supposed to retype this character-by-character on my phone? Or the references are for ornamentation only? In any case, it seems Ms. Olufemi either cannot hide her extreme contempt for the readers, or spent her life among people with an unusual amount of leisure.

5.E, p. 15: “When black feminists […] organised in the UK […] [t]hey were working towards collective improvement in material conditions… For example…” — The examples provided are: Grunwick strike by South Asian women and an Indian lady, Jayaben Desai. Right in the next sentence after that, Ms. Olufemi concludes: “There is a long history of black women […] mounting organised and strategic campaigning and lobbying efforts“. Again as in 2.C, she is completely unabated by the fact that the best examples of black feminist activities that she is able to furnish, have nothing to do with black feminists.

6.E. p. 23: “Critical feminism argues that state sexism has not lessened [for the last 50 years]“. Evidence: “MPs in parliament hide the very insidious ways that the state continues to enable male dominance … “ —- tension rises! — “the Conservative government introduced their plans to pass a Domestic Violence Bill with the intention of increasing the number of convictions for perpetrators of abuse…. “ — which looks good on the surface, but of course — “it is simply another example of the way the state plays on our anxieties about women’s oppression to disguise the enactment of policies that trap women in subordinate positions.” — Finally, we are about to learn how exactly the governments (and NGOs, remember!) keep women subjugated for the last 50 years, we are agog with curiosity! — “Research from the Prison Reform Trust has found an increase in the number of survivors being arrested” (p. 24) — And then… And then there is nothing. How does this prove that things are not better than 50 years ago? Just follow Mr. Olufemi example, and completely expunge from your mind everything that you claimed more than five sentences and seconds ago.

7.C, p. 26. “But this figure does not tell the whole story. The impact of these cuts is felt particularly by low-income black women and women of colour. “ — Another constant motif of the book is that Ms. Olufemi alternately blames the state for violence and overreach, only to immediately request expansion of paternalistic services and welfare.

8.C, p. 27: “If a woman must disclose […] that she has been raped […] her dignity, agency and power over personal information is compromised. “ — In a sudden turn of events our feminist seems to argue that it would be preferable for rape survivors to stay silent.

9.E, p. 28: When she does provide any sort of supporting evidence, it feels better that she wouldn’t: “We know that thousands of disabled people have died as a direct result of government negligence surrounding Personal Independence Payments […]9“ — The footnote is nothing but a 100 character URL, that I patiently typed in, only to be greeted with 404. According to webarchive.org, the referred-to page never existed. Ultimately, after many false starts (whose details I shall spare you), I found the document at a completely different web-site: https://questions-statements.parliament.uk/written-questions/detail/2018-12-19/203817 . Imagine my (lack of) surprise, when it turned out that government “negligence” is neither mentioned nor in any way implied or imputed—Ms. Olufemi simply fabricated the whole story.

10.I, p.29: “[In Yarl’s Wood IRC] they are locked in, unable to leave and subjected to surveillance by outsourced security guards. Tucked away in Bedford outside of the public consciousness, it’s hard to think of a more potent example of state violence.” — Judgment of anybody who, in the world of wars, continuous genocides and slaughter of human beings, maintains that the worst example of state violence is the sufferings of the women, who fled their ruined countries to the relative safety of the UK, must be thoroughly questioned. The second quoted sentence is also indefensible grammatically.

11.E, p. 30: In support of her claim that the state violently oppresses black women, Ms. Olufemi provides stories of 3 black women, that died in police custody over the course of… 50 years. “they reveal a pattern“ — she confidently concludes. No, they don’t. Statistical data would, but they do not support Ms. Olufemi’s thesis. She then proceeds to lament “a dystopian nightmare for the undocumented migrants“ — conveniently forgetting that these people tried as hard as they could to move to the dystopian UK and none of them hurried back. The section the quote is from is called State Killings — the 3 examples provided are somehow put in the same rubric as the doings of Pol-Pot and Mao.

12.E, p. 31: “If black women die disproportionately at the hands of the police, historically and in the present moment” — and then she proceeds on the assumption that they do, without providing any evidence. Immediately available public data (from INQUEST and IOPC reports), clearly refute the premise.

13.C, p. 32: “This refusal to participate [in capitalism] takes many forms: feminist activists are finding new and creative ways to oppose austerity.“ — Ms. Olufemi’s personal creative way to refuse to participate in the capitalist economy is to copyright a book, publish it with a publishing corporation (a capitalist enterprise, mind you) and then collect the royalties.

14.E., p. 33: “Sisters Uncut has put domestic and sexual violence on the national agenda” — Ms. Olufemi’s desire to prop her friends is laudable, but it does not eliminate the need to provide evidence.

15.I., p.33: “When I ask Sandy where the idea to create Sisters came from, she tells me” — Who’s Sandy? No Sandy was mentioned before. A certain Sandy Stone, the author of A Posttranssexual Manifesto appears 30 pages later, but it is unlikely she is meant here. The simple present tense of the sentence uncannily reminds of the way children talk about their imaginary friends.

16.C., p. 36: “So that just […] – Shulamith Firestone” — Oops, Ms. Olufemi approvingly quotes S. Firestone — a central figure in the so much derided second-wave liberal feminism.

17.C, p. 38: Trying to tie ‘social reproduction’ to race, Ms. Olufemi notices: “Wealthy white women have always been able to exercise greater agency over their reproductive capacity because they can afford private healthcare and specialist medical advice “, omitting to mention that so have wealthy black women too. The difference, as she herself emphasised with the “because” part, is in wealth not race. Mr. Olufemi then proceeds to build far-reaching conclusions from this rather trivial error.

18.E, p. 39: Being a radical revolutionary, Ms. Olufemi is not afraid to cast aspersions on defenceless dead women: “Margaret Sanger, reproductive rights advocate responsible for the first birth control-clinic in the United States was a vocal eugenicist“ — the consensus in the literature is that M. Sanger was not a eugenist or racist in any shape of form. See

Roberts, Dorothy (1998). Killing the Black Body: Race, Reproduction, and the Meaning of Liberty. Knopf Doubleday. ISBN 9780679758693. LCCN 97002383, p.77--78
Gordon, Linda (2002). The Moral Property of Women: A History of Birth Control Politics in America. University of Illinois Press. ISBN 9780252027642.
Valenza, Charles (1985). "Was Margaret Sanger a Racist?". Family Planning Perspectives. 17 (1). Guttmacher Institute: 44–46. doi:10.2307/2135230. JSTOR 2135230. PMID 3884362

Ms. Olufemi’s source? Angela Davis. At this point, let me make an aside. Ms Olufemi treats Angela Davis as a kind of mother figure and a hero: she quotes her left and right, and puts her on the blurb of the back cover. Angela Davis, in the meantime, is a KGB stooge, a “useful idiot” in an apt expression of Lenin. That beacon of freedom-fighting unashamedly praised and was on the pay of the Soviet communist regime that, as was very well-known at the time, organised the extermination of tens of millions of people.

19.C, p. 41: Suddenly Ms. Olufemi lashes out against contraceptives, linking them to… eugenic: “eugenics has shaped our notion of family […] the racist logic of ‘population control’ that birthed the desire for contraceptives” — in this place again, it’s difficult to tell her from far-right and religious fundamentalists.

20.C, p. 44: “nobody understands the stakes around the right to access abortion and reproductive justice more than working class women“ — That looks like a good point to discuss Ms. Olufemi’s background. Far from being qualified to understand the “stakes”, she comes from what definitely does not look like a working class: after graduating from a privileged grammar school, she was immediately forced by the oppressive patriarchal society to study at Cambridge. Her PhD. research was sponsored (via TECHNE AHRC) by the very same government that she so mercilessly scrutinises in the present opus.

21.E, p. 44: “Ireland is coded ‘white’, and ‘Irish woman’ means only those who fall under that coding“ — neither of these claims is supported. Fortunately they are, as usual, in no way used in the following, because Ms Olufemi quickly switches to other, equally unsupported, claims.

22.E, p. 47: “English MPs voted […] to change Northern Ireland’s abortion law […]. This means that Abortion in Northern Ireland has only recently been decriminalised, the possibility of persecution has been lifted from those administering and undergoing abortions. “ — (Capitalisation as in the original.) There were no “underground” abortions in N. Ireland, because free abortions were available in England, a few hours away on a ferry.

23.E, p. 47: “The tendency to consider the UK a progressive environment for reproductive justice sorely underestimates the number of people, who despite the change in law may still have no recourse to abortion services“ — Well, then tell us.

24.E, p. 48: “Winning radically would mean […] a world without work” — That’s refreshingly honest, even though Marx won’t approve of such blatant revisionism.

25.E, p. 50: “throughout history, to be ‘female’ has often meant death, mutilation and oppression” — this unsupported claim is clearly wrong. Throughout history, most victims of violence by a large margin were and are men. Most people killed in the wars are men. Most people dying a violent death are men. Most people in prisons and mental institutions are men. More than 90% of work-related deaths happen to men.

26.I, p. 50: “If there are only two categories, it is easier for us to organise the world and attach feelings, emotions and ways of being to each one.“ — If it were so, then all categories would have been binary: there would have been 2 nations, 2 fashion styles, etc.

27.C, p. 50 Ms. Olufemi continues to argue that gender is fluid and a person can change it at will. Her arguments: “there is no way to adequately describe what gender is. Every definition does a disservice to the shifting, multiple and complex set of power relations that come to shape a person’s gender.“ — It would be interesting to trace her train of thought if “gender” were replaced with “race” in this sentence. As race is much more of a social construct and less rooted in biology, surely Ms. Olufemi would agree that we should welcome when a “racially dysphoric” white person claims to be black. Unfortunately that would uproot her basic tenet about the exclusivity of black women’s experience and its central role in the formation of radical feminism.

28.C, p. 52: “If one group of people consistently behave, speak, move, present themselves in one way and another in the ‘opposite’ way, we reaffirm the idea that there is actually an inherent difference between those two groups when no such difference exists.“ — I guess the groups Ms. Olufemi has in mind are males and females. Or maybe whites and blacks?

29.C, p. 52: “Many intersex infants […] have surgery to ‘correct’ their genitalia without their consent.“ — That’s an interesting notion. Should we abstain from, for example, fixing broken bones of small children, until such time as they would be mature enough to consent?

30.E, p. 53: “Many women are physically stronger than men; many men are physically weaker than women. These are not exceptions that defy a rule; there simply is no rule.” — This betrays an utter ignorance of statistics and data. Normal distribution of strength (as measured, for example, by grip tests) in a population with different means for males and females is among the most reliably established anthropometrical facts.

31.E, p. 54: “To argue that there is a clear difference between sex and gender serves to solidify the idea that biological sex, prior to human beings inventing it and naming its tenants, exists.“ — here again Ms. Olufemi joins far-right and religious fundamentalists in her anti-science stance and denial of the evolutionary origin of mechanisms of sexual reproduction. The objective existence of biological sex, manifested in morphological, physiological and behavioural differences (sexual dimorphism) is attested beyond the slightest doubt across the entire animal kingdom.

32.E, p.68: “It is the public rejoicing at 19-year-old Shamima Begum being stripped of her citizenship “ — Ms. Olufemi chose as her example of Islamophobia a girl who joined ISIS, and became there an enforcer that threatened other women with death lest they comply with ISIS rules, stripped suicide vests into their clothes and ended up burying all her 3 children in this non-secular utopia.

33.E, p. 71: “Muslim women are the most economically disenfranchised group in the country.” — this simple claim is simply wrong. According to the UK Office for National Statistics, the most economically disenfranchised group in the UK are (as expected) refugees and asylum seekers, followed (unfortunately for Ms. Olufemi) by white workers in “post-industrial” (read: de-industrialised) communities.

34.C, p. 74: “A staunchly secular way of thinking about our lives and bodies limits Muslim women’s ability to understand themselves “ — Ms. Olufemi sympathy for organised monotheistic religions and her distrust in the secular society is rather unexpected. It is not clear how to reconcile her idea that to better understand themselves women should abandon secularism and return to the church or mosque with the feminist dogma.

35.I, p. 76: “creation of a public outcry about ‘Asian Grooming Gangs.’” — It’s sad that a feminist can scary-quote and dismiss one of the most horrible cases of mass abuse of women. A sacrifice of the suffering of more than a thousand girls to make a vindictive political point does not paint Ms. Olufemi as a good human being.

36.I, p. 86: “Art is a tool for feminist propaganda “ — Unfortunately Ms. Olufemi forgot that this chapter is called Art for Art’s Sake and its main thesis is that art of not a tool of anything. Notice also how she repeats Stalin and Goebbels almost verbatim.

37.C, p. 88: “Poor women do not get to make art: the fact that Saye’s work could be displayed in one of the most prestigious arenas in the world … calls us to wake up to the cruelty of inequity. ” — This is probably one of the most impressive examples of Ms. Olufemi’s ability to forget the beginning of a sentence (that she wrote!) by the time she gets to its middle — she demonstrates that poor women cannot make art by an example of a poor woman whose art became fashionable and famous. But then she easily outdoes herself! By the time we get to the end of the sentence, she forgets what was in the middle (or otherwise she thinks that being displayed at a Venice Biennale is unusually cruel).

38.C, p. 89: “Momtaza Mehri, essayist, researcher and former Poet Laureate for young people, tells me. ” — Ms. Olufemi continues to give proof of art being unavailable to poor women by providing another example: of a poor woman who was a Poet Laureate.

39.I, p. 110: “The idea that justice is served when criminals go to prison is relatively new. […] Ironically, prisons were introduced in order to make punishment more ‘humane’.” — Unironically, this is one of the most blatantly ignorant statements to grace the printing press lately. Prisons, as known to anybody with even superficial knowledge of history, existed as long as states did. There are some echoes of Foucault in that sentence, but poorly read or remembered, because his conclusion was the opposite.

40.E, p. 123: “In July 2019, Cancer Research UK, fundraising partners with dieting organisation Slimming World, launched a multi-million pound campaign using defunct scientific indicators to claim that obesity was the second leading cause of cancer. “ — This is outright dangerous. Spreading falsehoods to the vulnerable people in the risk groups is extremely irresponsible. Large‑scale cohort studies show that higher body mass index (BMI) and excess adiposity correlate with increased incidence and worse outcomes for multiple cancer types. These data form the backbone of public‑health recommendations promoting weight management:

Renehan AG, Tyson M, Egger M, Heller RF, Zwahlen M. “Body-mass index and incidence of cancer: a systematic review and meta-analysis of prospective observational studies.” The Lancet, 371(9612), 569-578 (2008).
Bhaskaran K, Douglas I, Forbes H, dos-Santos-Silva I, Leon DA, Smeeth L. “Body-mass index and risk of 22 specific cancers: a population-based cohort study of 5·24 million UK adults.” The Lancet, 384(9945), 755-765 (2014).

41.I, p. 124: “There is no clearer manifestation of neo-liberalism than in our attitudes towards bodies. “ — this, in the words of Pauli “is not even wrong”, whatever “attitudes toward bodies” means, they are not the clearest manifestation of neo-liberalism.

42.C, p. 125: “the myth that fatness means ill health” — Again, Ms Olufemi would be home with far-right conspiracy theorists. Excess adipose tissue (body fat) is strongly linked to a variety of health problems, including hypertension, insulin resistance, type 2 diabetes, cardiovascular disease, sleep apnea, and certain cancers. Medical researchers generally treat obesity as a risk factor that actively contributes to many of these conditions.

43.I, p. 126: “Nearly half of single parents in the UK – working or unemployed – live in relative poverty. “ — That’s because relative poverty is defined by the UK statistical agencies as “being poorer than half of the population”.

At this point your humble Scheherazade broke off in exhaustion.

Conclusions

“we need to remove the shame in the way we talk about acceptable forms of killing” — p. 48

I hope my notes demonstrated that even the most lenient and indulgent reader would quickly conclude that Feminism Interrupted is balderdash. That would be, however, too trivial a conclusion. Everything has a purpose and Ms. Olufemi book can find one with our help. As far as I can see, the purpose is to realise that even though incoherent and rambling, the text has a texture, some recurrent images appear again and again:

1. Ms Olufemi continuously laments the exploitation, poverty and oppression of women in the UK. Well, as everything in human condition, exploitation is relative. Ms. Olufemi enjoys the life of comfort, privilege and ease unimaginable to anyone outside of the scopes of the "developed" (i.e., capitalist) world or the last hundred years. She imagines a utopia of a stateless society free of "exploitation", but there is no indication that this eschaton is possible. It is not even a practical possibility that is questionable, but logical: is her image free of internal contradictions? All attempts to realise this millenialist dream, from Bogomils to Soviet and Chinese communists, are remembered for little besides industrial scale murder they invariably resulted in.

2. Ms. Olufemi’s feelings toward organised religion are clearly ambiguous. On one hand, she presumably understands that organised religion is the core institution of patriarchy that maintains and perpetuates values and structures that she finds so odious. On the other hand, she obviously cannot stop expressing her admiration of the austere faith of Mohammed, comparing it favourably with the decadent secular societies of the West.

3. Ms. Olufemi cannot decide whether she wants to abolish the state or expand it tremendously. Often in the course of the same period she registers her conviction that the state is evil and should be abolished, only to proceed to point out how unjust it is that the state does not sufficiently help the dispossessed and to request enlargement of welfare.

4. As was noted on many occasions, many of Ms. Olufemi’s positions echo ones of far-right conspirologists. Her distrust of science and NGOs makes one reconsider the “horseshoe theory” favourably.

tag:blogger.com,1999:blog-5799246.post-3773344585696791790

Extensions

Publish or Perish in style.

nikita Mar 3, 2025 Updated Mar 3, 2025

Show full content

Joseph Liouville, a famous French mathematician, whom multiple important theorems are named after, was also the founder and the editor of Journal de Mathématiques Pures et Appliquées, universally known as Liouville's journal (still in print, still very prestigious, two centuries later!).

Here is the list of the articles Liouville published in his own journal in 1861:

And then some more:

... and more ...

...

Look somewhat... similar don't they? The truth is, Liouville proved a certain general result about quadratic forms, but chose to keep it secret. Instead, he published almost two hundred papers with special cases, easily obtainable from his general theorem, but rather mysterious otherwise.

This was the bulk of his scientific output in the early 1860s.
References[0] Lützen, Jesper, Joseph Liouville, 1809-1882: Master of Pure and Applied Mathematics. Springer New York, 1990.

tag:blogger.com,1999:blog-5799246.post-3223513979171054489

Extensions

Long [story of] division.

nikita Feb 25, 2025 Updated Feb 25, 2025

Show full content

code { background-color: #f0f0f0; color: #333; padding: 2px 4px; border-radius: 4px; }

The following text can be viewed as extremely dry and intimidating, or, equally, lightheadedly funny.

Let's formally verify the venerable long-division algorithm.

uintN_t div(uintN_t n, uintN_t d) {
        uintN_t q := 0;
        uintN_t r := 0;
        int     i := N - 1;
        while (i != -1) {
                r <<= 1;
                r |= ((n >> i) & 1);
                if (r >= d) {
                        r := r - d;
                        q |= 1 << i;
                }
                i := i - 1;
        }
        return q;
}

Here uintN_t is the type of unsigned N-bit integers, N > 0. We shall establish formal correctness via Hoare logic. The following is by no means an introduction to the subject, our presentation skims over a large number of important details, please refer to the literature cited on the Wikipedia page. The basic element of Hoare logic is a Hoare triple, which is a construction of the form

⟦  precondition  ⟧
COMMAND
⟦ postcondition  ⟧

This triple means that if an execution of COMMAND starts in a state satisfying precondition, then the execution can only terminate in a state satisfying postcondition. (We use ⟦ and ⟧ instead of more traditional { and }, because our ambient language uses braces.) The pre- and postconditions are formulae of predicate calculus that can refer to the terms of the programming language (variables, literals, etc.). A triple is valid, if it can be proved starting from the usual rules of the predicate calculus and certain axioms. For a given programming language, one presents a list of axioms, describing the behaviour of the language constructs, and then proves soundness, i.e., establishes that the axioms and the accepted rules of inference are satisfied by all possible computations. We will need the following axioms:

Axiom of assignment

⟦  S[ x := E ]  ⟧
x := E
⟦  S  ⟧

Here S[ x:= E ] is the result of substituting E for each occurrence of x in the formula S. (In this form the axiom really only works for simple unaliased variables and does not work for pointers or arrays, which is sufficient in our case.) The axiom looks "backward", so let's play with it a bit. First, check that the assignment does set the variable to the desired value:

⟦  ?  ⟧
x := 4
⟦  x == 4  ⟧

The command is a simple assignment x := 4, the postcondition, x == 4, verifies that the variable got the expected value. What precondition guarantees that the assignment establishes the postcondition? The assignment axiom gives us for the precondition (x == 4)[ x := 4 ] = (4 == 4) = true. That is, no matter what was going on before the assignment, after it terminates, x == 4, as expected:

⟦  true  ⟧
x := 4
⟦  x == 4  ⟧

A bit more complex example:

⟦  ?  ⟧
x := x + 1
⟦  x > 0  ⟧

What precondition guarantees that x will be positive after increment? We can compute the precondition, it is (x > 0)[ x := x + 1 ] = (x + 1 > 0) = (x > -1) — perfectly reasonable.

What if we are given a precondition does not have the form that the axiom requires?

⟦  x == A  ⟧
x := x + d
⟦  ?  ⟧

There is no postcondition S, such that (x == A) = S[ x := x + d ]

Well, in this case you are stuck. To derive a postcondition using the axiom of assignment, you first have to massage the precondition in a form, where x only happens as part of E. Fortunately in this case it's easy:

/* Comments as in PL/I. */
⟦  x == A  ⟧
/* Simple arithmetics: add d to both sides. */
⟦  x + d == A + d  ⟧
x := x + d
⟦  x == A + d  ⟧

What if the precondition does not contain x? Then the assignment is useless for program correctness, and, hence, can be most likely discarded. :-)

Typically, when you use the assignment axiom for a formal verification, you have to come up with a precondition, that has one or more instances of E and then the axiom let's you to jump to a postcondition where each E is simplified to x. Next is

Axiom of composition

This axiom describes the ;-sequencing operator.

If we have

⟦  precondition  ⟧
COMMAND0
⟦  condition  ⟧

and

⟦  condition  ⟧
COMMAND1
⟦  postcondition  ⟧

Then the axiom allows us to conclude

⟦  precondition  ⟧
COMMAND0 ; COMMAND1
⟦  postcondition  ⟧

This matches the expected semantics of sequential execution.

Conditional axiom

For a conditional statement of a form

if (guard) { 
        COMMAND0 
} else {
        COMMAND1 
}

We have

⟦  precondition  ⟧
if (guard) {
        ⟦  guard && precondition  ⟧
        COMMAND0;
        ⟦  postcondition  ⟧
} else {
        ⟦  !guard && precondition  ⟧
        COMMAND0;
        ⟦  postcondition  ⟧
}
⟦  postcondition  ⟧

That is, if both "then" and "else" commands establish the same postcondition, given the original precondition strengthened by the guard or its negation, then the entire conditional statement establishes the same postcondition. This is fairly intuitively obvious.

Finally, we need

While-loop axiom

Consider a loop

while (guard) {
        BODY
}

To apply the while-loop axiom, we have to find an assertion, called a loop invariant that is preserved by the loop body, that is such that

⟦  guard && invariant  ⟧
BODY
⟦  invariant  ⟧

If the body is entered, while the invariant holds (and the guard holds too), then the invariant is true at the end of the body execution. Given an invariant, the while-loop axiom gives

⟦  invariant  ⟧
while (guard) {
        BODY
}
⟦  !guard && invariant  ⟧

In other words, if the invariant was true at the beginning of the loop execution, then it is true when the loop terminates. The while-loop axiom shows to an observant reader that loops are pure magic: it is the only construction that starts in a state satisfying a known condition, given by the invariant, and then miraculously strengthens that condition by adding !guard conjunct. Perhaps due to this the founders of structured programming preferred while-loops to the much-derided loops with "a control variable", like DO loops in FORTRAN and for-each loops of the modern languages.

There are many more axioms (what about the rules for function calls and recursion?), but we won't need them or will hand-wave around them.

Now, back to the long division. We want to establish the validity of the following triple:

uintN_t div(uintN_t n, uintN_t d) {
        ⟦  d > 0  ⟧
        uintN_t q := 0;
        uintN_t r := 0;
        int     i := N - 1;
        while (i != -1) {
                r <<= 1;
                r |= ((n >> i) & 1);
                if (r >= d) {
                        r := r - d;
                        q |= 1 << i;
                }
                i := i - 1;
        }
        ⟦  n == d*q + r && 0 <= r && r < d  ⟧
        return q;
}

The structure of the code basically forces the structure of any possible proof:

Find an invariant, preserved by the loop body.
Prove that the invariant is established before the loop is entered.
Prove that the desired postcondition follows from the conjunction of the invariant and the negation of the guard.

Finding a suitable invariant is the most non-trivial part of the job. Fortunately, in this case we are helped by our (presumed) experience of manually executing this algorithm all too many times at the elementary school. To make it less boring, I give an example of how long division is done in my native country, you should be able to figure it out:

After the first step (when the subtraction under the first horizontal line on the left has been completed), the algorithm established that 273 == 97*2 + 79, where by construction 79 < 97, which looks promisingly similar to the form of the postcondition that we want to establish: n == d*q + r && r < d. It then makes sense to select as the invariant "the highest N - i - 1 digits of dividend (i.e., n), divided by the divisor (i.e., d), have the highest N - i - 1 digits of q as the quotient and r at the remainder" (in our binary case the digits are bits).

Provided that we manage to establish that this is actually an invariant, the other remaining pieces fall in place quickly:

At the beginning of the loop, i == N - 1 so "the highest N - i - 1 bits" degenerate into "the highest 0 bits", for which the condition is vacuous.
Similarly at the termination of the loop we have i == -1, so N - i - 1 == N and we have the desired postcondition.

But before we embark on the actual proof, we have to introduce some terminology, to simplify the necessary formal manipulations.

We are operating on N-bit unsigned binary numbers. We shall refer to the more and less significant bits as "left" or "last" or "high" and "right" or "first" or "low" respectively, with the appropriate comparative and superlative forms and without, of course, making any assumptions about endianness. Bits are indexed 0 ... N - 1 from right to left (Thank you, Fibonacci, very clever! Not.).

We will do a lot of bit-shifting. Recall that for t >= 0, x >> t == floor(x/2^t) and x << t == x*2^t. Again, all values are unsigned, and so are shifts. Bitwise OR and AND are denoted as | and & as in C.

On a loop iteration with a particular value of i, we will be especially interested in shifts by i and i + 1 bits. Write

B' = (1 << i) for the i-th bit bitmask.
B" = (1 << (i + 1)) for the (i + 1)-st bit bitmask.
t' = (t >> i), for the value t shifted i bits right.
t" = (t >> (i + 1)), for the value t shifted i + 1 bits right.
M(k) = (1 << k) - 1, for the bitmask of the first k bits.

We treat ' and " as singular operators, binding tighter than any binary ones.

As a warm-up, prove the following

LEMMA x' == 2*x" + x'&1

(Once you rewrite 2*x" as (x >> (i + 1)) << 1, it should be trivial.)

"The highest N - i - 1 bits" of x mentioned in the informal invariant above can be obtained by discarding the remaining N - (N - i - 1) == i + 1 bits, and so are x >> (i + 1), or, as we luckily agreed, x". It makes sense to try n" == d*q" + r && r < d && 0 <= r as the invariant. This assertion is established at the loop entrance and guarantees the final postcondition after the loop termination. Unfortunately, it is *not* an invariant of our loop. To conclude this, observe that this assertion holds at the loop entrance even if the initial value of q is not 0. If it were an invariant, then initialising q to an arbitrary value would still produce a correct result, which is clearly not the case, because bits of q are only set (by q |= 1 << i) and never cleared, so in the final value of q all the bits set initially remain set.

As it turns out (after many a painful attempt), this is the only obstruction and once we add to the invariant a conjunct q&M(i + 1) == 0 stating that i + 1 lowest bits of q are 0, we obtain the desired invariant:

LOOP INVARIANT n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0

(If you want a good laugh and have some time to spare, paste div() code in a ChatGPT chat and ask various models what the loop invariant is.)

To the proof then. First, check that the invariant is established at the loop entrance that is, that the following triple is valid.

⟦  d > 0  ⟧
uintN_t q := 0;
uintN_t r := 0;
int     i := N - 1;
⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0  ⟧

Go from bottom to top, applying the assignment axiom and simplifying on each step. First, expand the invariant as

⟦  n >> (i + 1) == d*(q >> (i + 1)) + r && r < d && 0 <= r && q&((1 << (i + 1)) - 1) == 0  ⟧

Now apply the assignment axiom (i.e., replace i with (N - 1))...

⟦  n >> ((N - 1) + 1) == d*(q >> ((N - 1) + 1)) + r && r < d && 0 <= r && q&((1 << ((N - 1) + 1)) - 1) == 0  ⟧
i := N - 1;
⟦  n >> (i + 1) == d*(q >> (i + 1)) + r && r < d && 0 <= r && q&((1 << (i + 1)) - 1) == 0  ⟧

... simplify, use x >> N == 0 for any N-bit value, and apply the assignment axiom again ...

⟦  0 == d*0 + 0 && 0 < d && 0 <= 0 && (q & ~0) == 0  ⟧
r := 0
⟦  0 == d*0 + r && r < d && 0 <= r && (q & ~0) == 0  ⟧

... and one more time ...

⟦  0 == 0 && 0 < d && (0 & ~0) == 0  ⟧
q := 0
⟦  0 == d*0 + 0 && 0 < d && 0 <= 0 && (q & ~0) == 0  ⟧

... which finally gives

⟦  0 < d  ⟧

Which is exactly the given precondition. Voilà! Interestingly, it seems division by zero is impossible, because there is no suitable remainder.

Next, we need to prove that the invariant is preserved by the loop body. This is by far the most complex and inundating part of the proof. We want to establish the following triple (at this point let's expand the compound assignment operators and add a trivial else to the conditional so that it conforms to the form expected by our conditional axiom):

⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0 && i != -1  ⟧
r := r << 1;
r := r | ((n >> i) & 1);
if (r >= d) {
        r := r - d;
        q := q | (1 << i);
} else {
}
i := i - 1;
⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0  ⟧

First, the guard i != -1 is only needed to guarantee that shifts by i and i + 1 bits make sense. It is not used for anything else and will not be mentioned again.

We can proceed as before: start at the bottom and apply the assignment axiom to work our way up:

⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧
i := i - 1;
⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0  ⟧

Note that after substituting i - 1 for i, x" nicely transforms into x'. But at this point we are stuck: we know the postcondition that the conditional operator must establish, but we have no idea what its suitable precondition is. Take a step back. We now have n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0, that we will call the target. The composition of two assignments and one conditional operator, starting from the loop invariant must establish the target. Write it down:

LOOP INVARIANT n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0

TARGET n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0

Comparing the loop invariant and the target, we see that transforming the former into the latter takes:

Replacing q" with q'.
Replacing n" with n'.
Replacing q&M(i + 1) == 0 with q&M(i) == 0.

The last one is easy: if the first i + 1 bits of q are zero (this is what q&M(i + 1) == 0 means), then a fortiori so are its i first bits, so q&M(i) == 0.

As for replacing q" with q' and n" with n', we will do this via the lemma we stated (and you proved) earlier. We will now apply transformations to the loop invariant such that: (i) it will make it possible to apply the lemma and (ii) it will produce the result that will be a suitable precondition for the following assignments. The right-hand sides of the assignments are r <<= 1 (that is 2*r) and r | ((n >> i) & 1) (that is r | (n'&1)), so we will try to produce an assertion having sub-formulae of this form.

The starting invariant again:

⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0  ⟧

Multiply both sides of all conjuncts by 2. This produces terms such that the lemma and the assignment axiom for r := 2*r can be applied.

⟦  2*n" == 2*d*q" + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧

Immediately we can apply the lemma: 2*q" == q' - q'&1.

⟦  2*n" == d*(q' - q'&1) + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧

q&M(i + 1) == 0 hence we can drop q'&1, as it is guaranteed to be 0.

⟦  2*n" == d*q' + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧

Amazing! We got rid of q" and this is even before the first statement of the loop body was executed. Continue...

Looking forward to r := r | n'&1, we see that we have no |-s in sight, so the assignment axiom cannot be applied directly. Intuitively, this should not be the problem, because after r is doubled, its lowest bit is zero, and so | to it is the same as +, and we have plenty of additions. To prove this it will be nice to have a conjunct r&1 == 0 at that point. But if such a conjunct is present, then before the r := 2*r assignment it looked (as per the assignment axiom) as (2*r)&1 == 0, which is always true, and so we can just as well insert it at this point!

⟦  2*n" == d*q' + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0 && (2*r)&1 == 0  ⟧

More pressingly, to apply the assignment axiom to r := r | n'&1 we need n'&1 next to each r. To this end, observe that n'&1 is either 0 or 1, and so if 2*r < 2*d then 2*r + n'&1 < 2*d.

⟦  2*n" == d*q' + 2*r && 2*r + n'&1 < 2*d && 0 <= 2*r && q&M(i + 1) == 0 && (2*r)&1 == 0  ⟧

We are fully ready to apply the assignment axiom:

⟦  2*n" == d*q' + 2*r && 2*r + n'&1 < 2*d && 0 <= 2*r && q&M(i + 1) == 0 && (2*r)&1 == 0  ⟧
r := 2*r
⟦  2*n" == d*q' + r && r + n'&1 < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧

Apply the lemma: 2*n" == n' - n'&1

⟦  n' == d*q' + r + n'&1 && r + n'&1 < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧

The next statement is the assignment r := r | n'&1. Thanks to r&1 == 0 conjunct, carefully prepared in advance, we know that we can replace r + n'&1 with r | n'&1 and apply the assignment axiom:

⟦  n' == d*q' + r + n'&1 && r + n'&1 < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧
⟦  n' == d*q' + (r | n'&1) && (r | n'&1) < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧
r := r | n'&1
⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0  ⟧

One starts feeling at this point, that the steps of the derivation are practically forced by the form of the invariant. The appearance of r + n'&1 components in the assertion is a result of using the lemma to get rid of q" and n". In fact, it seems possible that the algorithm itself could have been derived ad initio, given the invariant. More about this at the end.

We found the mysterious precondition of the conditional statement. One relatively simple final step remains: we have to establish that both conditional branches, given this precondition, establish the target. Let's start with the r >= d branch. We need

⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0 && r >= d  ⟧
r := r - d;
q := q | B'
⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧

Experienced as we are at this point, we can easily transform the precondition to a form suitable for the next assignment (and also drop the redundant 0 <= r conjunct, implied by the conditional guard):

⟦  n' == d*q' + (r - d) + d && r - d < d && q&M(i + 1) == 0 && r - d >= 0  ⟧

Apply the assignment axiom

⟦  n' == d*q' + (r - d) + d && r - d < d && q&M(i + 1) == 0 && r - d >= 0  ⟧
r := r - d
⟦  n' == d*q' + r + d && r < d && q&M(i + 1) == 0 && r >= 0  ⟧

Prepare for the q := q | B' assignment. To this end, we have to transform the last assertion to a form where q only happens as a part of q | B'. First, from q&M(i + 1) == 0 it follows that q | B' == q + B' (because i-th bit of q is zero). Next, do the easy part, q&M(i + 1) == 0: weaken it, as was discussed above, to q&M(i) == 0, then, use (B' | M(i)) == 0 (immediately from the definition of M(i)) to arrive at (q | B')&M(i) == 0.

Next, deal with d*q' + r + d.

        d*q' + r + d
      == d*(q' + 1)  + r
      == d*(q + B')' + r /* Convince yourself that (x >> i) + 1 == (x + (1 << i)) >> i */
      == d*(q | B')' + r

Apply the assignment axiom

⟦  n' == d*(q | B')' + r && r < d && (q|B')&M(i) == 0 && r >= 0  ⟧
q := q | B'
⟦  n' == d*q' + r && r < d && q&M(i) == 0 && r >= 0  ⟧

Wait a second. This is exactly the target: n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0. We are done! What remains, is the trivial verification for the r < d conditional branch:

⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0 && r < d  ⟧
/* Algebra and weakening q&M(i + 1) == 0 to q&M(i) == 0 */
⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧

We are done with the verification of the loop invariant!

We now know that our loop invariant is indeed an invariant. The while-loop axiom then assures us that at the termination of the loop, the invariant will still hold, together with the negation of the guard:

⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0  ⟧
while (i != -1) {
        r <<= 1;
        r |= ((n >> i) & 1);
        if (r >= d) {
                r := r - d;
                q |= 1 << i;
        }
        i := i - 1;
}
⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0 && i == -1  ⟧

OK, so substitute i == -1 to the invariant:

⟦  n" == d*q" + r && r < d && 0 <= r && q&M(i + 1) == 0 && i == -1  ⟧
⟦  n == d*q + r && r < d && 0 <= r  ⟧

Hallelujah!

Let's put it all together

uintN_t div(uintN_t n, uintN_t d) {
⟦  d > 0  ⟧
⟦  0 == 0 && 0 < d && (0 & ~0) == 0  ⟧
        uintN_t q := 0;
⟦  0 == d*0 + 0 && 0 < d && 0 <= 0 && (q & ~0) == 0  ⟧
        uintN_t r := 0;
⟦  0 == d*0 + r && r < d && 0 <= r && (q & ~0) == 0  ⟧
        int     i := N - 1;
⟦  n" == d*q" + r && 0 <= r && r < d && q&M(i + 1) == 0  ⟧
        while (i != -1) {
⟦  n" == d*q" + r && 0 <= r && r < d  && q&M(i + 1) == 0 && i != -1  ⟧
⟦  2*n" == 2*d*q" + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧
⟦  2*n" == d*(q' - q'&1) + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧
⟦  2*n" == d*q' + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0  ⟧
⟦  2*n" == d*q' + 2*r && 2*r < 2*d && 0 <= 2*r && q&M(i + 1) == 0 && (2*r)&1 == 0  ⟧
⟦  2*n" == d*q' + 2*r && 2*r + n'&1 < 2*d && 0 <= 2*r && q&M(i + 1) == 0 && (2*r)&1 == 0  ⟧
                r <<= 1;
⟦  2*n" == d*q' + r && r + n'&1 < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧
⟦  n' == d*q' + r + n'&1 && r + n'&1 < 2*d && 0 <= r && q&M(i + 1) == 0 && r&1 == 0  ⟧
                r |= ((n >> i) & 1);
⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0  ⟧
                if (r >= d) {
⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0 && r >= d  ⟧
⟦  n' == d*q' + r + d && r < d && q&M(i + 1) == 0 && r >= 0  ⟧
                        r := r - d;
⟦  n' == d*q' + r + d && r < d && q&M(i + 1) == 0 && r >= 0  ⟧
⟦  n' == d*(q | B')' + r && r < d && (q|B')&M(i) == 0 && r >= 0  ⟧
                        q |= 1 << i;
⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧
                } else {
⟦  n' == d*q' + r && r < 2*d && 0 <= r && q&M(i + 1) == 0 && r < d  ⟧
⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧
                }
⟦  n' == d*q' + r && r < d && 0 <= r && q&M(i) == 0  ⟧
                i := i - 1;
⟦  n" == d*q" + r && 0 <= r && r < d && q&M(i + 1) == 0  ⟧
        }
⟦  n" == d*q" + r && 0 <= r && r < d && i == -1  ⟧
⟦  n == d*q + r && 0 <= r && r < d  ⟧
        return q;
}

Seriously, the proof above looks at a first (and then any following) sight, as a random barrage of bizarre formal spasms in haphazard directions. It is practically impossible to construct such a sequence of assertions in a top-to-bottom fashion, unless one spends an unhealthy amount of time interacting with Hoare triples in dark alleys.

And this is why nobody is doing it this way (among humans that is, automated provers are only too happy to try insane numbers of possible dead-ends). Early on, a much better-structured approach, going in the opposite direction, starting from the known targets (postconditions) was developed, see Predicate transformer semantics, or better still, read A Discipline of Programming ("59683rd Edition" as the Amazon page mentions nonchalantly). Dijkstra also shared the opinion that the structure of the program and the postcondition are tightly locked to the extent that it is possible to derive a program, given its formal specification, see the amazing EWD1162.

tag:blogger.com,1999:blog-5799246.post-3504612436864435404

Extensions

Euclid continuity

nikita May 7, 2024 Updated Dec 22, 2024

Show full content

MathJax.Hub.Config({ jax: ["input/TeX", "output/CommonHTML"], tex2jax: { inlineMath: [['$', '$'], ['\$', '\$']], displayMath: [['$$', '$$'], ['\\[', '\\]']], processEscapes: true }, "HTML-CSS": { fonts: ["TeX"] }, showProcessingMessages: false, messageStyle: "none" });

Let's talk about one of the simplest, if not trivial, subjects in the oldest and best-established branch of mathematics: rectangle area in elementary Euclid geometry. The story contains two twists and an anecdote.

We all know that the area of a rectangle or a parallelogram is a product of its base and height, and the area of a triangle is half of that (areas of a parallelogram, a triangle and a rectangle can all be reduced to each other by a device invented by Euclid), but Euclid would not say that: the idea that measures such as lengths, areas or volumes can be multiplied is alien to him, as it still is to Galileo. There is a huge body of literature on the evolution that culminated with our modern notion of number, unifying disparate incompatible numbers and measures of the past mathematics, enough to say that before Newton-Leibniz time, ratios and fractions were not the same.

Euclid instead says (Book VI, prop. I, hereinafter quotes from the Elements are given as <Greek | English | Russian>):

<τὰ τρίγωνα καὶ τὰ παραλληλόγραμμα, τὰ ὑπὸ τὸ αὐτὸ ὕψος ὄντα πρὸς ἄλληλά ἐστιν ὡς αἱ βάσεις. | Triangles and parallelograms, which are under the same height are to one another as their bases.| Треугольники и параллелограммы под одной и той же высотой, [относятся] друг к другу как основания.>

Given rectangles $ABCD$ and $AEFD$ with the same height $AD$, we want to prove that the ratio of their areas is the same as of their bases: $\Delta(ABCD)/\Delta(AEFD) = AB/AE$.

First, consider a case where the bases are commensurable, that is, as we would say $AB/AE = n/m$ for some integers $n$ and $m$, or as Euclid would say, there is a length $AX$, such that $AB = n \cdot AX$ (that is, the interval $AB$ is equal to $n$ times extended interval $AX$) and $AE = m \cdot AX$. Then, $ABCD$ can be divided into $n$ equal rectangles $AXYD$ with the height $AD$ the base $AX$ and the area $\Delta_0$, and $AEFD$ can be divided into $m$ of them.

Then, $$\begin{array}{lclclcl} \Delta(ABCD) & = & \Delta(AXYD) & + & \Delta(XX'Y'Y) & + & \ldots\\ & = & n \cdot \Delta_0, & & & & \end{array}$$ and $$\begin{array}{lclclcl} \Delta(AEFD) & = & \Delta(AXYD) & + & \Delta(XX'Y'Y) & + & \ldots \\ & = & m \cdot \Delta_0 \end{array}$$ so that $\Delta(ABCD)/\Delta(AEFD) = n/m = AB/AE$, as required.

Starting from the early twentieth century, the rigorous proof of the remaining incommensurable case in a school-level exposition typically involves some form of a limit and is based on an implicit or explicit continuity axiom, usually equivalent to Cavaliery principle.

There is, however, a completely elementary, short and elegant proof, that requires no additional assumptions. This proof is used by Legendre (I don't know who is the original author) in his Elements, Éléments de géométrie. Depending on the edition, it is Proposition III in either Book III (p. 100) or Book IV (page 90, some nineteenth-century editions, especially with "additions and modifications by M. A. Blanchet", are butchered beyond recognition, be careful). The proof goes like this:

For incommensurable $AB$ and $AE$ consider the ratio $\Delta(ABCD)/\Delta(AEFD)$. If $\Delta(ABCD)/\Delta(AEFD) = AB/AE$ we are done. If $\Delta(ABCD)/\Delta(AEFD)$ is not equal to $AB/AE$, it is instead equal to $AB/AO$ and either $AE < AO$ or $AE > AO$. Consider the first case (the other one is similar).

The points at the base are in order $A$, then $B$, then $E$, then $O$.

Divide $AB$ into $n$ equal intervals, each shorter that $EO$. This requires what we now call the Archimedes-Eudoxus axiom and which is implied by Definition IV of Book V:

<λόγον ἔχειν πρὸς ἄλληλα μεγέθη λέγεται, ἃ δύναται πολλαπλασιαζόμενα ἀλλήλων ὑπερέχειν. | Magnitudes are said to have a ratio to one another which can, when multiplied, exceed one another. | Величины имеют отношение между собой, если они взятые кратно могут превзойти друг друга.>

Then continue dividing $BE$, until we get to a point $I$ outside of $BE$, but within $EO$ (because the interval is shorter than $EO$). The points are now in order $A$, then $B$, then $E$, then $I$, then $O$.

$AB$ and $AI$ are commensurable, so $\Delta(ABCD)/\Delta(AIKD) = AB/AI$. Also, $\Delta(ABCD)/\Delta(AEFD) = AB/AO$, so $\Delta(AIKD)/\Delta(AEFD) = AI/AO$. By construction $AI < AO$, hence $\Delta(AIKD) < \Delta(AEFD)$, but $AEFD$ is a proper part of $AIKD$, so $\Delta(AEFD) < \Delta(AIKD)$. Contradiction.

Step back and look at the structure of these two proofs from the modern perspective. Fix the height and let $\Delta(X)$ be the area of the rectangle with the base of length $X$. By an assumption that we would call "additivity of measure" $\Delta(X+Y) = \Delta(X) + \Delta(Y)$, that is, $\Delta$ is an additive function. A general and easy-to-establish fact (mentioned with remarkable persistency on this blog [Unexpected isomorphism], [The Hunt for Addi(c)tive Monster]) is that any additive function is linear on rationals, that is, $\Delta(n/m \cdot X) = n/m \cdot \Delta(X)$. This corresponds to the "commensurable" part of the proofs. To complete a proof we need linearity: $\Delta(X) = X \cdot H$, where $H = \Delta(1)$. But additive functions are not necessarily linear. To obtain linearity, an additional condition is needed. The traditional proof uses continuity: a continuous (at least at one point) additive function is necessarily linear.

Legendre's proof uses monotonicity: a monotonic additive function is always linear. This is clever, because monotonicity is not an additional assumption: it follows from the already assumed positivity of measure: If $Y > X$, then $\Delta(Y) = \Delta(X + (Y - X)) = \Delta(X) + \Delta(Y - X) > \Delta(X)$, as $\Delta(Y - X) > 0$.

How does the original Euclid's proof look like? (He proves the triangle version, which is similar to rectangles.)

Wait... It is unbelievably short, especially given that the Elements use no notation and spell everything in words and it covers both triangles and parallelograms. It definitely has no separate "commensurable" and "imcommensurable" parts. How is this possible?

The trick is in the definition of equal ratios, Def. V of Book V:

<ἐν τῷ αὐτῷ λόγῳ μεγέθη λέγεται εἶναι πρῶτον πρὸς δεύτερον καὶ τρίτον πρὸς τέταρτον, ὅταν τὰ τοῦ πρώτου καὶ τρίτου ἰσάκις πολλαπλάσια τῶν τοῦ δευτέρου καὶ τετάρτου ἰσάκις πολλαπλασίων καθ᾽ ὁποιονοῦν πολλαπλασιασμὸν ἑκάτερον ἑκατέρου ἢ ἅμα ὑπερέχῃ ἢ ἅμα ἴσα ᾖ ἢ ἅμα ἐλλείπῃ ληφθέντα κατάλληλα. | Magnitudes are said to be in the same ratio, the first to the second and the third to the fourth, when, if any equimultiples whatever are taken of the first and third, and any equimultiples whatever of the second and fourth, the former equimultiples alike exceed, are alike equal to, or alike fall short of, the latter equimultiples respectively taken in corresponding order. | Говорят, что величины находятся в том же отношении: первая ко второй и третья к четвёртой, если равнократные первой и третьей одновременно больше, или одновременно равны, или одновременно меньше равнократных второй и четвёртой каждая каждой при какой бы то ни было кратности, если взять их в соответственном порядке.>

In modern notation this means that

$$\Delta_1 / \Delta_2 = b_1 / b_2 \equiv (\forall n\in\mathbb{N}) (\forall m\in\mathbb{N}) (n\cdot\Delta_1 \gtreqqless m\cdot\Delta_2 = n\cdot b_1 \gtreqqless m\cdot b_2),$$

where $\gtreqqless$ is "FORTRAN 3-way comparison operator" (aka C++ spaceship operator):

$$ X \gtreqqless Y = \begin{cases} -1, & X < Y\\ 0, & X = Y\\ +1, & X > Y \end{cases} $$

This looks like a rather artificial definition of ratio equality, but with it the proof of Proposition I and many other proofs in Books V and VI, become straightforward or even forced.

The approach of selecting the definitions to streamline the proofs is characteristic of abstract twentieth-century mathematics and it is amazing to see it in full force in the earliest mathematical text we have.

I'll conclude with the promised anecdote (unfortunately, I do not remember the source). An acquaintance of Newton having met him in the Cambridge library and found, on inquiry, that Newton is reading the Elements, remarked something to the effect of "But Sir Isaac, haven't your methods superseded and obsoleted Euclid?". This is one of the two recorded cases when Newton laughed.

tag:blogger.com,1999:blog-5799246.post-1180638076965374011

Extensions

When people used to care about the quality of presentation

nikita Mar 4, 2024 Updated Mar 4, 2024

Show full content

From Errata to Dijsktra's A Primer of Algol 60 Programming.

tag:blogger.com,1999:blog-5799246.post-6734062168527213008

Extensions

usched: update

nikita Oct 12, 2022 Updated Mar 14, 2023

Show full content

Update for the previous post about stackswap coroutine implementation usched.

To recap, usched is an experimental (and very simple, 120LOC) coroutine implementation different from stackful and stackless models: coroutines are executed on the native stack of the caller and when the coroutine is about to block its stack is copied into a separately allocated (e.g., in the heap) buffer. The buffer is copied back onto the native stack when the coroutine is ready to resume.

I added a new scheduler ll.c that distributes coroutines across multiple native threads and then does lockless scheduling within each thread. In the benchmark (the same as in the previous post), each coroutine in the communicating cycle belongs to the same thread.

Results are amazing: usched actually beats compiler-assisted C++ coroutines by a large margin. The horizontal axis is the number of coroutines in the test (logarithmic) and the vertical axis is coroutine wakeup-wait operations per second (1 == 1e8 op/sec).

16 32 64 400 800 4000 8000 40000 80000 400000 800000 4M 8M GO 0.077 0.127 0.199 0.326 0.323 0.285 0.228 0.142 0.199 0.305 0.303 0.286 0.268 C++ 1.089 1.234 1.344 1.262 1.201 1.159 1.141 1.135 1.163 1.168 1.138 1.076 1.051 UL 0.560 0.955 1.515 2.047 2.095 2.127 2.148 2.160 2.154 2.020 1.932 1.819 1.811

I only kept the most efficient implementation from every competing class: C++ for stackless, GO for stackful and usched for stackswap. See the full results in results.darwin

tag:blogger.com,1999:blog-5799246.post-7516699533582960030

Extensions

Generating Catalan numbers.

nikita Oct 7, 2022 Updated Mar 27, 2023

Show full content

Enumerate all binary trees with N nodes, C++20 way:

#include <memory>
#include <string>
#include <cassert>
#include <iostream>
#include <coroutine>
#include <cppcoro/generator.hpp>

struct tnode;
using tree = std::shared_ptr<tnode>;
struct tnode {
	tree left;
	tree right;
	tnode() {};
	tnode(tree l, tree r) : left(l), right(r) {}
};

auto print(tree t) -> std::string {
	return  t ? (std::string{"["} + print(t->left) + " "
		     + print(t->right) + "]") : "*";
}

cppcoro::generator<tree> gen(int n) {
	if (n == 0) {
		co_yield nullptr;
	} else {
		for (int i = 0; i < n; ++i) {
			for (auto left : gen(i)) {
				for (auto right : gen(n - i - 1)) {
					co_yield tree(new tnode(left, right));
				}
			}
		}
	}
}

int main(int argc, char **argv) {
	for (auto t : gen(std::atoi(argv[1]))) {
		std::cout << print(t) << std::endl;
	}
}

Source: gen.cpp.
To generate Catalan numbers, do:

$ for i in $(seq 0 1000000) ;do ./gen $i | wc -l ;done
       1
       1
       2
       5
      14
      42
     132
     429
    1430
    4862
   16796
   58786
  208012
  742900

tag:blogger.com,1999:blog-5799246.post-6445599694747824354

Extensions

A Python: it's dictionaries all the way down.

nikita Oct 7, 2022 Updated Mar 27, 2023

Show full content

def drill():
    return defaultdict(drill)

tag:blogger.com,1999:blog-5799246.post-1372390405689867147

Extensions

usched: stackswap coroutines, neither stackful nor stackless

nikita Oct 6, 2022 Updated Dec 22, 2024

Show full content

[Please read the update.]

This repository (https://github.com/nikitadanilov/usched) contains a simple experimental implementation of coroutines alternative to well-known "stackless" and "stackful" methods.

The term "coroutine" gradually grew to mean a mechanism where a computation, which in this context means a chain of nested function calls, can "block" or "yield" so that the top-most caller can proceed and the computation can later be resumed at the blocking point with the chain of intermediate function activation frames preserved.

Prototypical uses of coroutines are lightweight support for potentially blocking operations (user interaction, IO, networking) and generators, which produce multiple values (see same fringe problem).

There are two common coroutine implementation methods:

a stackful coroutine runs on a separate stack. When a stackful coroutine blocks, it performs a usual context switch. Historically "coroutines" meant stackful coroutines. Stackful coroutines are basically little more than usual threads, and so they can be kernel (supported by the operating system) or user-space (implemented by a user-space library, also known as green threads), preemptive or cooperative.
a stackless coroutine does not use any stack when blocked. In a typical implementation instead of using a normal function activation frame on the stack, the coroutine uses a special activation frame allocated in the heap so that it can outlive its caller. Using heap-allocated frame to store all local variable lends itself naturally to compiler support, but some people are known to implement stackless coroutines manually via a combination of pre-processing, library and tricks much worse than Duff's device.

Stackful and stateless are by no means the only possibilities. One of the earliest languages to feature generators CLU (distribution) ran generators on the caller's stack.

usched is in some sense intermediate between stackful and stackless: its coroutines do not use stack when blocked, nor do they allocate individual activation frames in the heap.

The following is copied with some abbreviations from usched.c.

Overview

usched: A simple dispatcher for cooperative user-space threads.

A typical implementation of user-space threads allocates a separate stack for each thread when the thread is created and then dispatches threads (as decided by the scheduler) through some context switching mechanism, for example, longjmp().

In usched all threads (represented by struct ustack) are executed on the same "native" stack. When a thread is about to block (usched_block()), a memory buffer for the stack used by this thread is allocated and the stack is copied to the buffer. After that the part of the stack used by the blocking thread is discarded (by longjmp()-ing to the base of the stack) and a new thread is selected. The stack of the selected thread is restored from its buffer and the thread is resumed by longjmp()-ing to the usched_block() that blocked it.

The focus of this implementation is simplicity: the total size of usched.[ch] is less than 120LOC, as measured by SLOCCount.

Advantages:

no need to allocate maximal possible stack at thread initialisation: stack buffer is allocated as needed. It is also possible to free the buffer when the thread is resumed (not currently implemented);
a thread that doesn't block has 0 overhead: it is executed as a native function call (through a function pointer) without any context switching;
because the threads are executed on the stack of the same native underlying thread, native synchronisation primitives (mutices, etc.) work, although the threads share underlying TLS. Of course one cannot use native primitives to synchronise between usched threads running on the same native thread.

Disadvantages:

stack copying introduces overhead (memcpy()) in each context switch;
because stacks are moved around, addresses on a thread stack are only valid while the thread is running. This invalidates certain common programming idioms: other threads and heap cannot store pointers to the stacks, at least to the stacks of the blocked threads. Note that Go language, and probably other run-times, maintains a similar invariant.

Usage

usched is only a dispatcher and not a scheduler: it blocks and resumes threads but

it does not keep track of threads (specifically allocation and freeing of struct ustack instances is done elsewhere),
it implements no scheduling policies.

These things are left to the user, together with stack buffers allocation and freeing. The user supplies 3 call-backs:

usched::s_next(): the scheduling function. This call-backs returns the next thread to execute. This can be either a new (never before executed) thread initialised with ustack_init(), or it can be a blocked thread. The user must keep track of blocked and runnable threads, presumably by providing wrappers to ustack_init() and ustack_block() that would record thread state changes. It is up to usched::s_next() to block and wait for events if there are no runnable threads and all threads are waiting for something;
usched::s_alloc(): allocates new stack buffer of at least the specified size. The user have full control over stack buffer allocation. It is possible to pre-allocate the buffer when the thread is initialised (reducing the cost of usched_block()), it is possible to cache buffers, etc.;
usched::s_free(): frees the previously allocated stack buffer.

rr.h and rr.c provide a simple "round-robin" scheduler implementing all the call-backs. Use it carefully, it was only tested with rmain.c benchmark.

Pictures!

The following diagrams show stack management by usched. The stack grows from right to left.

At the entrance to the dispatcher loop. usched_run(S):

                                                usched_run()----------------------------------------------+--------------+-------+                                              | buf | anchor |  ...  |----------------------------------------------+--------------+-------+                                              ^                                              |                                              sp = S->s_buf

A new (never before executed) thread U is selected by S->s_next(), launch() calls the thread startup function U->u_f():


                               U->u_f() launch() usched_run() -----------------------------+---------+-----+--------------+-------+                              |         | pad | buf | anchor |  ...  | -----------------------------+---------+-----+--------------+-------+                              ^         ^                              |         |                              sp        U->u_bottom

The thread executes as usual on the stack, until it blocks by calling usched_block():

    usched_block()       bar() U->u_f() launch() usched_run() ----------+------+-----+-----+---------+-----+--------------+-------+           | here | ... |     |         | pad | buf | anchor |  ...  | ----------+------+-----+-----+---------+-----+--------------+-------+      ^    ^                            ^      |    +-- sp = U->u_cont           |      |                                 U->u_bottom      U->u_top

The stack from U->u_top to U->u_bottom is copied into the stack buffer U->u_stack, and control returns to usched_run() by longjmp(S->s_buf):


                                                usched_run()----------------------------------------------+--------------+-------+                                              | buf | anchor |  ...  |----------------------------------------------+--------------+-------+                                              ^                                              |                                              sp = S->s_buf

Next, suppose S->s_next() selects a previously blocked thread V ready to be resumed. usched_run() calls cont(V).


                                        cont()  usched_run()----------------------------------------+-----+--------------+-------+                                        |     | buf | anchor |  ...  |----------------------------------------+-----+--------------+-------+                                        ^                                        |                                        sp

cont() copies the stack from the buffer to [V->u_top, V->u_bottom] range. It's important that this memcpy() operation does not overwrite cont()'s own stack frame, this is why pad[] array is needed in launch(): it advances V->u_bottom and gives cont() some space to operate.


  usched_block()       foo() V->u_f()   cont()  usched_run()---------+------+-----+-----+--------+--+-----+--------------+-------+         | here | ... |     |        |  |     | buf | anchor |  ...  |---------+------+-----+-----+--------+--+-----+--------------+-------+    ^    ^                           ^  ^    |    +-- V->u_cont               |  +-- sp    |                                |    V->u_top                         V->u_bottom

Then cont() longjmp()-s to V->u_cont, restoring V execution context:


  usched_block()       foo() V->u_f()   cont()  usched_run()---------+------+-----+-----+--------+--+-----+--------------+-------+         | here | ... |     |        |  |     | buf | anchor |  ...  |---------+------+-----+-----+--------+--+-----+--------------+-------+         ^         +-- sp = V->u_cont

V continues its execution as if it returned from usched_block().

Multiprocessing

By design, a single instance of struct usched cannot take advantage of multiple processors, because all its threads are executing within a single native thread. Multiple instances of struct usched can co-exist within a single process address space, but a ustack thread created for one instance cannot be migrated to another. One possible strategy to add support for multiple processors is to create multiple instances of struct usched and schedule them (that is, schedule the threads running respective usched_run()-s) to processors via pthread_setaffinity_np() or similar. See rr.c for a simplistic implementation.

Current limitations

the stack is assumed to grow toward lower addresses. This is easy to fix, if necessary;
the implementation is not signal-safe. Fixing this can be as easy as replacing *jmp() calls with their sig*jmp() counterparts. At the moment signal-based code, like gperf -lprofiler library, would most likely crash usched;
usched.c must be compiled without optimisations and with -fno-stack-protector option (gcc);
usched threads are cooperative: a thread will continue to run until it completes of blocks. Adding preemption (via signal-based timers) is relatively easy, the actual preemption decision will be relegated to the external "scheduler" via a new usched::s_preempt() call-back invoked from a signal handler.

Notes

Midori seems to use a similar method: a coroutine (called activity there) starts on the native stack. If it needs to block, frames are allocated in the heap (this requires compiler support) and filled in from the stack, the coroutine runs in these heap-allocated frames when resumed.

Benchmarks

usched was benchmarked against a few stackful (go, pthreads) and stackless (rust, c++ coroutines) implementations. A couple of caveats:

all benchmarking in general is subject to the reservations voiced by Hippocrates and usually translated (with the complete reversal of the meaning) as ars longa, vita brevis, which means: "the art [of doctor or tester] takes long time to learn, but the life of a student is brief, symptoms are vague, chances of success are doubtful".
the author is much less than fluent with all the languages and frameworks used in the benchmarking. It is possible that some of the benchmarking code is inefficient or just outright wrong. Any comments are appreciated.

The benchmark tries to measure the efficiency of coroutine switching. It creates R cycles, N coroutines each. Each cycle performs M rounds, where each round consists of sending a message across the cycle: a particular coroutine (selected depending on the round number) sends the message to its right neighbour, all other coroutines relay the message received from the left to the right, the round completes when the originator receives the message after it passed through the entire cycle.

If N == 2, the benchmark is R pairs of processes, ping-ponging M messages within each pair.

Some benchmarks support two additional parameters: D (additional space in bytes, artificially consumed by each coroutine in its frame) and P (the number of native threads used to schedule the coroutines.

The benchmark creates N*R coroutines and sends a total of N*R*M messages, the latter being proportional to the number of coroutine switches.

bench.sh runs all implementations with the same N, R and M parameters. graph.py plots the results.

POSIX. Source: pmain.c, binary: pmain. Pthreads-based stackful implementation in C. Uses default thread attributes. pmain.c also contains emulation of unnamed POSIX semaphores for Darwin. Plot label: "P". This benchmarks crashes with "pmain: pthread_create: Resource temporarily unavailable" for large values of N*R.
Go. Source: gmain.go, binary: gmain. The code is straightforward (it was a pleasure to write). D is supported via runtime.GOMAXPROCS(). "GO1T" are the results for a single native thread, "GO" are the results without the restriction on the number of threads.
Rust. Source: cycle/src/main.rs, binary: cycle/target/release/cycle. Stackless implementation using Rust builtin async/.await. Label: "R". It is single-threaded (I haven't figured out how to distribute coroutines to multiple executors), so should be compared with GO1T, C++1T and U1T. Instead of fighting with the Rust borrow checker, I used "unsafe" and shared data-structures between multiple coroutines much like other benchmarks do.
C++. Source: c++main.cpp, binary: c++main. The current state of coroutine support in C++ is unclear. Is everybody supposed to directly use <coroutine> interfaces or one of the mutually incompatible libraries that provide easier to use interfaces on top of <coroutine>? This benchmark uses Lewis Baker's cppcoro, (Andreas Buhr's fork). Labels: "C++" and "C++1T" (for single-threaded results).
usched. Source: rmain.c, binary: rmain. Based on usched.[ch] and rr.[ch] This is our main interest, so we test a few combinations of parameters.
- Label: "U": the default configuration, round-robin scheduler over 16 native threads,
- "U1K": 1000 bytes of additional stack space for each coroutine
- "U10K": 10000 bytes,
- "U1T": round-robin over 1 native thread,
- "U1TS": round-robin over 1 native thread with pthread locking in rr.c compiled out (-DSINGLE_THREAD compilation option, a separate binary rmain.1t).
- Update "UL": uses "local" scheduler ll.c. All coroutines within a cycle are assigned to the same native thread so that scheduling between them require no locking. This demonstrates very high throughput (comparable to C++), but unfortunately I do not have time right now to re-do all the measurements consistently. Binary: lmain.

bench.sh runs all benchmarks with N == 2 (message ping-pong) and N == 8. Raw results are in results.linux. In the graphs, the horizontal axis is the number of coroutines (N*R, logarithmic) and the vertical axis is the operations (N*R*M) per second

Environment: Linux VM, 16 processors, 16GB of memory. Kernel: 4.18.0 (Rocky Linux).

16 32 64 400 800 4000 8000 40000 80000 400000 800000 4000000 8000000 POSIX 1.76 3.46 6.39 14.58 14.85 14.70 13.63 9.87 8.02 0.00 0.00 0.00 0.01 GO 4.14 5.62 7.77 36.74 41.64 49.72 48.24 37.24 43.06 46.31 46.22 46.09 45.95 GO1T 4.38 4.30 4.27 4.11 3.81 3.53 3.40 3.33 3.43 3.99 3.98 3.95 3.86 RUST 9.48 8.71 8.69 8.64 8.53 7.85 6.59 4.32 3.80 3.63 3.63 3.83 3.90 U 17.24 17.27 17.30 25.77 29.99 71.68 77.32 78.92 77.98 80.88 82.09 83.66 82.15 U1K 16.21 16.29 16.35 25.38 28.41 69.92 75.76 74.31 73.65 76.69 76.75 75.84 76.56 U10K 9.04 8.96 9.09 20.38 21.69 58.13 60.95 59.66 60.50 61.32 61.71 62.06 62.72 U1T 17.37 17.31 17.35 17.35 17.36 17.27 17.29 17.14 17.06 16.91 16.91 16.91 16.87 C++ 49.87 67.85 74.94 73.91 73.04 62.48 59.15 57.23 56.48 55.35 55.44 54.02 53.61 C++1T 97.03 97.38 96.82 96.06 96.58 95.78 94.83 89.83 86.09 80.48 79.37 77.04 77.48 U1TS 49.53 49.76 49.83 50.16 49.93 48.88 49.75 48.75 47.99 46.48 46.25 45.99 46.12 UL 76.03 116.63 160.72 169.74 169.99 171.57 170.32 165.82 169.43 174.32 171.55 169.48 170.04

(N == 8) A few notes:

As mentioned above, pthreads-based solution crashes with around 50K threads.
Most single-threaded versions ("GO1T", "R" and "U1T") are stable as corpse's body temperature. Rust cools off completely at about 500K coroutines. Single-threaded C++ ("C++1T") on the other hand is the most performant solution for almost the entire range of measurement, it is only for coroutine counts higher than 500K when "U" overtakes it.
It is interesting that a very simple and unoptimised usched fares so well against heavily optimized C++ and Go run-times. (Again, see the reservations about the benchmarking.)
Rust is disappointing: one would hope to get better results from a rich type system combined with compiler support.

4 8 16 100 200 1000 2000 10000 20000 100000 200000 1000000 2000000 POSIX 0.56 0.97 1.84 6.36 6.88 7.78 7.82 7.58 7.15 5.34 0.00 0.00 0.00 GO 7.40 11.03 19.23 40.44 45.79 51.81 52.87 52.77 53.15 53.62 53.22 55.77 56.82 GO1T 4.54 4.55 4.53 4.53 4.53 4.52 4.52 4.50 4.47 4.36 4.31 4.26 4.26 RUST 5.68 5.75 5.75 4.74 4.74 4.62 4.46 4.13 3.70 2.81 2.77 2.76 2.73 U 11.22 11.27 11.26 11.30 7.91 24.66 38.72 35.67 40.60 41.18 42.06 42.96 42.74 U1K 9.64 9.62 9.65 9.67 7.61 22.14 34.38 31.70 34.54 34.56 34.59 35.47 35.56 U10K 4.43 4.62 4.50 4.25 5.02 15.79 26.18 25.33 27.60 27.62 27.63 27.72 28.16 U1T 11.24 11.29 11.34 11.26 11.32 11.30 11.28 11.28 11.22 11.19 11.15 11.13 11.15 C++ 46.33 46.30 63.38 114.30 117.05 114.12 111.36 101.32 100.13 84.30 78.53 72.77 71.00 C++1T 96.56 96.03 96.37 95.97 95.49 95.68 94.94 92.95 91.23 83.55 80.33 77.22 76.22 U1TS 19.59 19.66 19.80 19.87 19.89 19.86 19.82 19.72 19.66 19.51 19.45 19.33 19.37 UL 12.19 23.48 50.39 65.71 67.22 69.17 70.01 70.09 69.36 69.28 69.43 68.83 68.00

(N == 2)

First, note that the scale is different on the vertical axis.
Single-threaded benchmarks display roughly the same behaviour (exactly the same in "C++1T" case) as with N == 8.
Go is somewhat better. Perhaps its scheduler is optimised for message ping-pong usual in channel-based concurrency models?
usched variants are much worse (50% worse for "U") than N == 8.
Rust is disappointing.

To reproduce:

$ # install libraries and everything, then...$ make$ while : ;do ./bench.sh | tee -a results; sleep 5 ;done # collect enough results, this might take long...^C$ grep -h '^ *[2N],' results | python3 graph.py c2.svg > c2-table.md # create plot for N == 2$ grep -h '^ *[8N],' results | python3 graph.py c8.svg > c8-table.md # create plot for N == 8

Conclusion

Overall, the results are surprisingly good. The difference between "U1T" and "U1TS" indicates that the locking in rr.c affects performance significantly, and affects it even more with multiple native threads, when locks are contended across processors. I'll try to produce a more efficient (perhaps lockless) version of a scheduler as the next step.

tag:blogger.com,1999:blog-5799246.post-4590283583938514926

Extensions

3-lisp: an infinite tower of meta-circular interpreters.

nikita Aug 22, 2022 Updated Apr 11, 2025

Show full content

Précis

3-lisp is a dialect of Lisp designed and implemented by Brian C. Smith as part of his PhD. thesis Procedural Reflection in Programming Languages (what this thesis refers to as "reflection" is nowadays more usually called "reification"). A 3-lisp program is conceptually executed by an interpreter written in 3-lisp that is itself executed by an interpreter written in 3-lisp and so on ad infinitum. This forms a (countably) infinite tower of meta-circular (v.i.) interpreters. reflective lambda is a function that is executed one tower level above its caller. Reflective lambdas provide a very general language extension mechanism.

The code is here. Meta-circular interpreters

An interpreter is a program that executes programs written in some programming language.

A meta-circular interpreter is an interpreter for a programming language written in that language. Meta-circular interpreters can be used to clarify or define the semantics of the language by reducing the full language to a sub-language in which the interpreter is expressed. Historically, such definitional interpreters become popular within the functional programming community, see the classical Definitional interpreters for higher-order programming languages. Certain important techniques were classified and studied in the framework of meta-circular interpretation, for example, continuation passing style can be understood as a mechanism that makes meta-circular interpretation independent of the evaluation strategy: it allows an eager meta-language to interpret a lazy object language and vice versa. As a by-product, a continuation passing style interpreter is essentially a state machine and so can be implemented in hardware, see The Scheme-79 chip. Similarly, de-functionalisation of languages with higher-order functions obtains for them first-order interpreters. But meta-circular interpreters occur in imperative contexts too, for example, the usual proof of the Böhm–Jacopini theorem (interestingly, it was Corrado Böhm who first introduced meta-circular interpreters in his 1954 PhD. thesis) constructs for an Algol-like language a meta-circular interpreter expressed in some goto-less subset of the language and then specialises this interpreter for a particular program in the source language.

Given a language with a meta-circular interpreter, suppose that the language is extended with a mechanism to trap to the meta-level. For example, in a lisp-like language, that trap can be a new special form (reflect FORM) that directly executes (rather than interprets) FORM within the interpreter. Smith is mostly interested in reflective (i.e., reification) powers obtained this way, and it is clear that the meta-level trap provides a very general language extension method: one can add new primitives, data types, flow and sequencing control operators, etc. But if you try to add reflect to an existing LISP meta-circular interpreter (for example, see p. 13 of LISP 1.5 Programmers Manual) you'd hit a problem: FORM cannot be executed at the meta-level, because at this level it is not a form, but an S-expression.

Meta-interpreting machine code

To understand the nature of the problem, consider a very simple case: the object language is the machine language (or equivalently the assembly language) of some processor. Suppose that the interpreter for the machine code is written in (or, more realistically, compiled to) the same machine language. The interpreter maintains the state of the simulated processor that is, among other things registers and memory. Say, the object (interpreted) code can access a register, R0, then the interpreter has to keep the contents of this register somewhere, but typically not in its (interpreter's) R0. Similarly, a memory word visible to the interpreted code at an address ADDR is stored by the interpreter at some, generally different, address ADDR' (although, by applying the contractive mapping theorem and a lot of hand-waving one might argue that there will be at least one word stored at the same address at the object- and meta-levels). Suppose that the interpreted machine language has the usual sub-routine call-return instructions call ADDR and return and is extended with a new instruction reflect ADDR that forces the interpreter to call the sub-routine ADDR. At the very least the interpreter needs to convert ADDR to the matching ADDR'. This might not be enough because, for example, the object-level sub-routine ADDR might not be contiguous at the meta-level, i.e., it is not guaranteed that if ADDR maps to ADDR' then (ADDR + 1) maps (ADDR' + 1). This example demonstrates that a reflective interpreter needs a systematic and efficient way of converting or translating between object- and meta-level representations. If such a method is somehow provided, reflect is a very powerful mechanism: by modifying interpreter state and code it can add new instructions, addressing modes, condition bits, branch predictors, etc.

N-LISP for a suitable value of N

In his thesis Prof. Smith analyses what would it take to construct a dialect of LISP for which a faithful reflective meta-circular interpreter is possible. He starts by defining a formal model of computation with an (extremely) rigorous distinction between meta- and object- levels (and, hence, between use and mention). It is then determined that this model can not be satisfactorily applied to the traditional LISP (which is called 1-LISP in the thesis and is mostly based on Maclisp). The reason is that LISP's notion of evaluation conflates two operations: normalisation that operates within the level and reference that moves one level down. A dialect of LISP that consistently separates normalisation and reference is called 2-LISP (the then new Scheme is called LISP-1.75). Definition of 2-LISP occupies the bulk of the thesis, which the curious reader should consult for (exciting, believe me) details.

Once 2-LISP is constructed, adding the reflective capability to it is relatively straightforward. Meta-level trap takes the form of a special lambda expression:

(lambda reflect [ARGS ENV CONT] BODY)

When this lambda function is applied (at the object level), the body is directly executed (not interpreted) at the meta-level with ARGS bound to the meta-level representation of the actual parameters, ENV bound to the environment (basically, the list of identifiers and the values they are bound to) and CONT bound to the continuation. Environment and continuation together represent the 3-LISP interpreter state (much like registers and memory represent the machine language interpreter state), this representation goes all the way back to SECD machine, see The Mechanical Evaluation of Expressions.

Here is the fragment of 3-LISP meta-circular interpreter code that handles lambda reflect (together with "ordinary" lambda-s, denoted by lambda simple):

Implementation

It is of course not possible to run an infinite tower of interpreters directly.

3-LISP implementation creates a meta-level on demand, when a reflective lambda is invoked. At that moment the state of the meta-level interpreter is synthesised (e.g., see make-c1 in the listing above). The implementation takes pain to detect when it can drop down to a lower level, which is not entirely simple because a reflective lambda can, instead of returning (that is, invoking the supplied continuation), run a potentially modified version of the read-eval-loop (called READ-NORMALISE-PRINT (see) in 3-LISP) which does not return. There is a lot of non-trivial machinery operating behind the scenes and though the implementation modestly proclaims itself EXTREMELY INEFFICIENT it is, in fact, remarkably fast.

Porting

I was unable to find a digital copy of the 3-LISP sources and so manually retyped the sources from the appendix of the thesis. The transcription in 3-lisp.lisp (2003 lines, 200K characters) preserves the original pagination and character set, see the comments at the top of the file. Transcription was mostly straightforward except for a few places where the PDF is illegible (for example, here) all of which fortunately are within comment blocks.

The sources are in CADR machine dialect of LISP, which, save for some minimal and no longer relevant details, is equivalent to Maclisp.

3-LISP implementation does not have its own parser or interpreter. Instead, it uses flexibility built in a lisp reader (see, readtables) to parse, interpret and even compile 3-LISP with a very small amount of additional code. Amazingly, this more than 40 years old code, which uses arcane features like readtable customisation, runs on a modern Common Lisp platform after a very small set of changes: some functions got renamed (CASEQ to CASE, *CATCH to CATCH, etc.), some functions are missing (MEMQ, FIXP), some signatures changed (TYPEP, BREAK, IF). See 3-lisp.cl for details.

Unfortunately, the port does not run on all modern Common Lisp implementations, because it relies on the proper support for backquotes across recursive reader invocations:

;;     Maclisp maintains backquote context across recursive parser;;     invocations. For example in the expression (which happens within defun
;;     3-EXPAND-PAIR)
;;
;;         `\(PCONS ~,a ~,d)
;;
;;     the backquote is consumed by the top-level activation of READ. Backslash
;;     forces the switch to 3-lisp readtable and call to 3-READ to handle the
;;     rest of the expression. Within this 3-READ activation, the tilde forces
;;     switch back to L=READTABLE and a call to READ to handle ",a". In Maclisp,
;;     this second READ activation re-uses the backquote context established by
;;     the top-level READ activation. Of all Common Lisp implementations that I
;;     tried, only sbcl correctly handles this situation. Lisp Works and clisp
;;     complain about "comma outside of backquote". In clisp,
;;     clisp-2.49/src/io.d:read_top() explicitly binds BACKQUOTE-LEVEL to nil.

Among Common Lisp implementations I tried, only sbcl supports it properly. After reading Common Lisp Hyperspec, I believe that it is Maclisp and sbcl that implement the specification correctly and other implementations are faulty.

Conclusion

Procedural Reflection in Programming Languages is, in spite of its age, a very interesting read. Not only does it contain an implementation of a refreshingly new and bold idea (it is not even immediately obvious that infinite reflective towers can at all be implemented, not to say with any reasonable degree of efficiency), it is based on an interplay between mathematics and programming: the model of computation is proposed and afterward implemented in 3-LISP. Because the model is implemented in an actual running program, it has to be specified with extreme precision (which would make Tarski and Łukasiewicz tremble), and any execution of the 3-LISP interpreter validates the model.

tag:blogger.com,1999:blog-5799246.post-6525314200762048564

Extensions

Treadmill

nikita Jul 26, 2022 Updated Apr 11, 2025

Show full content

Treadmill is a "real-time" in-place garbage collection algorithm designed by H. Baker [0]. It is simple, elegant, efficient and surprisingly little known. Speaking of which, Mr. Baker's Wikipedia page rivals one for an obscure Roman decadent poet in scarcity of information.
The general situation of garbage collection is that there is a program (called a mutator in this case) that allocates objects (that will also be called nodes) in a heap, which is a pool of memory managed by the garbage collector. The mutator can update objects to point to other earlier allocated objects so that objects form a graph, possibly with cycles. The mutator can store pointers to objects in some locations outside of the heap, for example in the stack or in the registers. These locations are called roots.
The mutator allocates objects, but does not frees them explicitly. It is the job of the garbage collector to return unreachable objects, that is, the objects that can not be reached by following pointers from the roots, back to the allocation pool.
It is assumed that the collector, by looking at an object, can identify all pointers to the heap stored in the object and that the collector knows all the roots. If either of these assumptions does not hold, one needs a conservative collector that can be implemented as a library for an uncooperative compiler and run-time (e.g., Boehm garbage collector for C and C++).
The earliest garbage collectors were part of Lisp run-time. Lisp programs tend to allocate a large number of cons cells and organise them in complex structures with cycles and sub-structure sharing. In fact, some of the Lisp Machines had garbage collection implemented in hardware and allocated everything including stack frames and binding environments in the heap. Even processor instructions were stored as cons cells in the heap.
To allocate a new object, the mutator calls alloc(). Treadmill is "real-time" because the cost of alloc() in terms of processor cycles is independent of the number of allocated objects and the total size of the heap, in other words, alloc() is O(1) and this constant cost is not high. This means garbage collection without "stop-the-world" pauses, at least as long as the mutator does not genuinely exhaust the heap with reachable objects.
Treadmill is "in-place" because the address of an allocated object does not change. This is in contrast with copying garbage collectors that can move an object to a new place as part of the collection process (that implies some mechanism of updating the pointers to the moved object).
All existing garbage collection algorithms involve some form of scanning of allocated objects and this scanning is usually described in terms of colours assigned to objects. In the standard 3-colour scheme (introduced in [3] together with the term "mutator"), black objects have been completely scanned together with the objects they point to, gray objects have been scanned, but the objects they point to are not guaranteed to be scanned and white objects have not been scanned.
For historical reasons, Baker's papers colour free (un-allocated) objects white and use black-gray-ecru instead of black-gray-white. We stick with ecru, at least to get a chance to learn a fancy word.
Consider the simplest case first:

the heap has a fixed size;
the mutator is single-threaded;
allocated objects all have the same size (like cons cells).

(All these restrictions will be lifted eventually.)
The main idea of treadmill is that all objects in the heap are organised in a cyclic double-linked list, divided by 4 pointers into 4 segments:

Figure 0: treadmill
Allocation of new objects happens at free (clockwise), scan advances at scan (counterclockwise), still non-scanned objects are between bottom and top (the latter 2 terms, somewhat confusing for a cyclic list of objects, are re-used from an earlier paper [1], where a copying real-time garbage collector was introduced).
Remarkably, the entire description and the proof of correctness of Treadmill algorithm (and many other similar algorithms) depends on a single invariant:
Invariant: there are no pointers from black to ecru nodes.
That is, a black node can contain a pointer to another black node or to a gray node. A non-black (that is, gray or ecru) node can point to any allocated node: black, gray or ecru. An ecru node can be reached from a black node only through at least one intermediate gray node.
Let's for the time being postpone the explanation of why this invariant is important and instead discuss the usual 2 issues that any invariant introduces: how to establish it and how to maintain it.
Establishing is easy:

Figure 1: initial heap state
In the initial state, all objects are un-allocated (white), except for the roots that are gray. The invariant is satisfied trivially because there are no black objects.
After some allocations by the mutator and scanning, the heap looks like the one in Figure 0. A call to alloc() advances free pointer clockwise, thus moving one object from FREE to SCANNED part of the heap. There is no need to update double-linked list pointers within the allocated object and, as we will see, there is no need to change the object colour. This makes the allocation fast path very quick: just a single pointer update: free := free.next.

Figure 2: alloc()

Allocation cannot violate the invariant, because the newly allocated object does not point to anything. In addition to calls to alloc() the mutator can read pointer fields from nodes it already reached and update fields of reachable nodes to point to other reachable nodes. There is no pointer arithmetic (otherwise a conservative collector is needed). A reachable node is either black, gray or ecru, so it seems, at the first sight, that the only way the mutator can violate the invariant is by setting a field in a black object to point to an ecru object. This is indeed the case with some collection algorithms (called "gray mutator algorithms" in [2]). Such algorithms use a write barrier, which is a special code inserted by the compiler before (or instead of) updating a pointer field. The simplest write barrier prevents a violation of the 3-colour invariant by graying the ecru target object if necessary:

writebarrier(obj, field, target) {
        obj.field := target;
        if black(obj) && ecru(target) {
                darken(target);
        }
}
darken(obj) { /* Make an ecru object gray. */
        assert ecru(obj);
        unlink(obj); /* Remove the object from the treadmill list. */
        link(top, obj); /* Put it back at the tail of the gray list. */
}

More sophisticated write barriers were studied that make use of the old value of obj.field or are integrated with virtual memory sub-system, see [2] for details. In our case, however, when the mutator reads a pointer field of an object, it effectively stores the read value in a register (or in a stack frame slot) and in Treadmill, registers can be black (Treadmill is a "black mutator algorithm"). That is, the mutator can violate the invariant simply by reading the pointer to an ecru object in a black register. To prevent this a read barrier is needed, executed on every read of a pointer field:

readbarrier(obj, field) {
        if ecru(obj) {
                darken(obj);
        }
        return obj.field;
}

Figure 3: read barrier

When a black or gray object is read, the read barrier leaves it in place. When an ecru object is read, the barrier un-links the object from the treadmill list (effectively removing it from TOSCAN section) and re-links it to the treadmill either at top or at scan, thus making it gray. This barrier guarantees that the mutator cannot violate the invariant simply because the mutator never sees ecru objects (which are grayed by the barrier) and hence cannot store pointers to them anywhere. If the read barrier is present, the write barrier is not necessary.
That's how the invariant is established and maintained by the mutator. We still haven't discussed how the collector works and where these mysterious ecru objects appear from. The collector is very simple: it has a single entry point:

advance() { /* Scan the object pointed to by "scan". */
        for field in pointers(scan) {
                if ecru(scan.field) {
                        darken(scan.field);
                }
        }
        scan := scan.prev; /* Make it black. */
}

advance() takes the gray object pointed to by scan, which is the head of the FRONT list, and grays all ecru objects that this object points to. After that, scan is advanced (counterclockwise), effectively moving the scanned object into the SCANNED section and making it black.

Figure 4: advance()

It's not important for now how and when exactly advance() is called. What matters is that it blackens an object while preserving the invariant.
Now comes the crucial part. An allocated object only darkens: the mutator (readbarrier()) and the collector (advance()) can gray an ecru object and advance() blackens a gray object. There is no way for a black object to turn gray or for a gray object to turn ecru. Hence, the total number of allocated non-black objects never increases. But advance() always blackens one object, which means that after some number of calls (interspersed with arbitrary mutator activity), advance() will run out of objects to process: the FRONT section will be empty and there will be no gray objects anymore:

Figure 5: no gray objects
All roots were originally gray and could only darken, so they are now black. And an ecru object is reachable from a black object only through a gray object, but there are no gray objects, so ecru objects are not reachable from roots—they are garbage. This completes the collection cycle and, in principle, it is possible to move all ecru objects to the FREE list at that point and start the next collection cycle. But we can do better. Instead of replenishing the FREE list, wait until all objects are allocated and the FREE list is empty:

Figure 6: neither gray nor white

Only black and ecru objects remain. Flip them: swap top and bottom pointers and redefine colours: the old black objects are now ecru and the old ecru objects (remember they are garbage) are now white:

Figure 7: flip
The next collection cycle starts: put the roots between top and scan so that they are the new FRONT:

Figure 8: new cycle
From this point alloc() and advance() can continue as before.
Note that alloc(), advance() and readbarrier() do not actually have to know object colour. They only should be able to tell an ecru (allocated) object from non-ecru, so 1 bit of information per object is required. By waiting until the FREE list is empty and re-defining colours Treadmill avoids the need to scan the objects and change their colours at the end of a collection cycle: it is completely O(1).
The last remaining bit of the puzzle is still lacking: how is it guaranteed that the collection is completed before the FREE list is empty? If the mutator runs out of free objects before the collection cycle is completed, then the only option is to force the cycle to completion by calling advance() repeatedly until there are no more gray objects and then flip, but that's a stop-the-world situation. The solution is to call advance() from within alloc() guaranteeing scan progress. Baker proved that if advance() is called k times for each alloc() call, then the algorithm never runs out of free objects, provided that the total heap size is at least R*(1 + 1/k) objects, where R is the number of reachable objects.
This completes the Treadmill description.
The algorithm is very flexible. First, the restriction of a single-threaded mutator is not really important: as long as alloc(), advance(), readbarrier() and flip are mutually exclusive, no further constraints on concurrency are necessary. The mutator can be multi-threaded. The collector can be multi-threaded. advance() can be called "synchronously" (from alloc()), explicitly from the mutator code or "asynchronously" from the dedicated collector threads. A feedback-based method can regulate the frequency of calls to advance() depending on the amount of free and garbage objects. alloc() can penalise heavy-allocating threads forcing them to do most of the scanning, etc.
Next, when an object is grayed by darken(), all that matter is that the object is placed in the FRONT section. If darken() places the object next to top, then FRONT acts as a FIFO queue and the scan proceeds in the breadth-first order. If the object is placed next to scan then the scan proceeds in the depth-first order, which might result in a better locality of reference and better performance of a heap in virtual memory. A multi-threaded collector can use multiple FRONT lists, e.g., one per core and scan them concurrently.
New objects can be added to the heap at any time, by atomically linking them somewhere in the FREE list. Similarly, a bunch of objects can be at any moment atomically released from the FREE list with the usual considerations of fragmentation-avoidance in the lower layer allocator.
Support for variable-sized objects requires a separate cyclic list for each size (plus, perhaps an additional overflow list for very large objects). The top, bottom, scan and free pointers become arrays of pointers with an element for each size. If arbitrarily large objects (e.g., arrays) are supported then atomicity of advance() will require additional work: large objects need to be multi-coloured and will blacken gradually.
Forward and backward links to the cyclic list can be embedded in the object header or they can be stored separately, the latter might improve cache utilisation by the scanner.
References [0] The Treadmill: Real-Time Garbage Collection Without Motion Sickness PDF (subscription), Postscript [1] List Processing in Real Time on a Serial Computer PDF [2] The Garbage Collection Handbook. The art of automatic memory management gchandbook.org [3] On-the-Fly Garbage Collection: An Exercise in Cooperation PDF

tag:blogger.com,1999:blog-5799246.post-968464954289544428

Extensions

360 years later or „Скрещенья ног“

nikita Feb 13, 2021 Updated Feb 13, 2021

Show full content

In 1896 Paul Gauguin completed Te Arii Vahine (The King’s Wife):

From many similar paintings of his Tahitian period this, together with a couple of preparatory watercolours, is distinguished by artificial legs placement, which can be characterised in Russian by the equally forced line (quoted in this article's title) from a certain universally acclaimed poem. This strange posture is neither a deficiency nor an artistic whim. It is part of a silent, subtle game played over centuries, where moves are echoes and the reward—some flickering form of immortality:

This is Diana Resting, by Cranach the Elder, 1537. Let me just note the birds and leave the pleasure of finding other clues to the reader. Lucas Cranach (and this is more widely known) himself played a very similar game with Dürer.

By sheer luck, the first painting is just few kilometers away from me in Pushkin's museum.

tag:blogger.com,1999:blog-5799246.post-7216549716096328620

Extensions

A curious case of stacks and queues.

nikita Nov 11, 2020 Updated Nov 11, 2020

Show full content

When studying computing science we all learn how to convert an expression in the "normal" ("infix", "algebraic") notation to "reverse Polish" notation. For example, an expression "a*b + c*d" is converted to "a b * c d * +". An expression in reverse Polish notation can be seen as a program for a stack automaton:

PUSH A
PUSH B
MUL
PUSH C
PUSH D
MUL
ADD

Where PUSH pushes its argument on the top of the (implicit) stack, while ADD and MUL pop 2 top elements from the stack, perform the respective operation and push the result back.

For reasons that will be clearer anon, let's re-write this program as

Container c;
c.put(A);
c.put(B);
c.put(c.get() * c.get())
c.put(C);
c.put(D);
c.put(c.get() * c.get())
c.put(c.get() + c.get())

Where Container is the type of stacks, c.put() pushes the element on the top of the stack and c.get() pops and returns the top of the stack. LIFO discipline of stacks is so widely used (implemented natively on all modern processors, built in programming languages in the form of call-stack) that one never ask whether a different method of evaluating expressions is possible.

Here is a problem: find a way to translate infix notation to a program for a queue automaton, that is, in a program like the one above, but where Container is the type of FIFO queues with c.put() enqueuing an element at the rear of the queue and c.get() dequeuing at the front. This problem was reportedly solved by Jan L.A. van de Snepscheut sometime during spring 1984.

While you are thinking about it, consider the following tree-traversal code (in some abstract imaginary language):

walk(Treenode root) {
        Container todo;
        todo.put(root);
        while (!todo.is_empty()) {
                next = todo.get();
                visit(next);
                for (child in next.children) {
                        todo.put(child);
                }
        }
}

Where node.children is the list of node children suitable for iteration by for loop.

Convince yourself that if Container is the type of stacks, tree-walk is depth-first. And if Container is the type of queues, tree-walk is breadth-first. Then, convince yourself that a depth-first walk of the parse tree of an infix expression produces the expression in Polish notation (unreversed) and its breadth-first walk produces the expression in "queue notation" (that is, the desired program for a queue automaton). Isn't it marvelous that traversing a parse tree with a stack container gives you the program for stack-based execution and traversing the same tree with a queue container gives you the program for queue-based execution?

I feel that there is something deep behind this. A. Stepanov had an intuition (which cost him dearly) that algorithms are defined on algebraic structures. Elegant interconnection between queues and stacks on one hand and tree-walks and automaton programs on the other, tells us that the correspondence between algorithms and structures goes in both directions.

tag:blogger.com,1999:blog-5799246.post-2125793986032915115

Extensions

Unexpected isomorphism

nikita Oct 14, 2020 Updated Oct 14, 2020

Show full content

Since Cantor's "I see it, but I cannot believe it" (1877), we know that $\mathbb{R}^n$ are isomorphic sets for all $n > 0$. This being as shocking as it is, over time we learn to live with it, because the bijections between continua of different dimensions are extremely discontinuous and we assume that if we limit ourselves to any reasonably well-behaving class of maps the isomorphisms will disappear. Will they?

Theorem. Additive groups $\mathbb{R}^n$ are isomorphic for all $n > 0$ (and, therefore, isomorphic to the additive group of the complex numbers).

Proof. Each $\mathbb{R}^n$ is a vector space over rationals. Assuming axiom of choice, any vector space has a basis. By simple cardinality considerations, the cardinality of a basis of $\mathbb{R}^n$ over $\mathbb{Q}$ is the same as cardinality of $\mathbb{R}^n$. Therefore all $\mathbb{R}^n$ have the same dimension over $\mathbb{Q}$, and, therefore, are isomorphic as vector spaces and as additive groups. End of proof.

This means that for any $n, m > 0$ there are bijections $f : \mathbb{R}^n \to \mathbb{R}^m$ such that $f(a + b) = f(a) + f(b)$ and, necessary, $f(p\cdot a + q\cdot b) = p\cdot f(a) + q\cdot f(b)$ for all rational $p$ and $q$.

I feel that this should be highly counter-intuitive for anybody who internalised the Cantor result, or, maybe, especially to such people. The reason is that intuitively there are many more continuous maps than algebraic homomorphisms between the "same" pair of objects. Indeed, the formula defining continuity has the form $\forall x\forall\epsilon\exists\delta\forall y P(x, \epsilon, \delta, y)$ (a local property), while homomorphisms are defined by $\forall x\forall y Q(x, y)$ (a stronger global property). Because of this, topological categories have much denser lattices of sub- and quotient-objects than algebraic ones. From this one would expect that as there are no isomorphisms (continuous bijections) between continua of different dimensions, there definitely should be no homomorphisms between them. Yet there they are.

tag:blogger.com,1999:blog-5799246.post-5174937482132828254

Extensions

Why Go is Not My Favorite Programming Language

nikita Jan 27, 2019 Updated Mar 27, 2023

Show full content

Disclaimer: this article shares very little except the title with the classical Why Pascal is Not My Favorite Programming Language. No attempt is made to analyse Go in any systematic fashion. To the contrary, the focus is on one particular, if grave, issue. Moreover, the author happily admits that his experience with Go programming is very limited.
Go is a system programming language and a large fraction of system software is processing of incoming requests of some sort, for example:

[KERNEL] an OS kernel processes system calls;
[SERVER] a server processes requests received over network or IPC;
[LIB] a library processes invocations of its entry points.

A distinguishing feature of system software is that it should be resilient against abnormal conditions it the environment such as network communication failures, storage failure, etc. Of course, there are practical limits to such resilience and it is very difficult to construct a software that would operate correctly in the face on undetected processor or memory failures (albeit, such systems were built in the past), but it is generally agreed that system software should handle a certain class of failures to be usable as a foundation of software stack. We argue that Go is not suitable for system programming because it cannot deal with one of the most important failures in this class: memory allocation errors.
Out of many existing designs of failure handling (exceptions, recovery blocks, etc.) Go exclusively selects explicit error checking with a simple panic-recovery mechanism. This makes sense, because this is the only design that works in all the use-cases mentioned above. However, memory allocation errors do not produce checkable errors in Go. The language specification does not even mention a possibility of allocation failure and in the discussions of these issues (see e.g., here and here) Google engineers adamantly refuse considering a possibility of adding an interface to intercept memory allocation errors. Instead, various methods to warn the application that memory is "about to be exhausted" as proposed. These methods, of course, only reduce the probability of running out of memory, but never eliminate it (thus making bugs in the error handling code more difficult to test). As one can easily check by running a simple program that allocates all available memory, memory allocation error results in unconditional program termination, rather than a recoverable panic.
But even if a way to check for allocation errors or recover from them were added, it would not help, because Go often allocates memory behind the scenes, so that there is no point in the program source, where a check could be made. For example, memory is allocated whenever a struct is used as an interface:

package main
type foo interface {
        f() int
}

type bar struct {
        v int
}


func out(s foo) int {
        return s.f() - 1
}

func (self bar) f() int {
        return self.v + 1
}

func main() {
        for {
                out(bar{})
        }
}

The program above contains no explicit memory allocations, still, it allocates a lot of memory. The assembly output (use godbolt.org for example) for out(bar{}) contains a call to runtime.convT64() (see the source) that calls mallocgc.

func convT64(val uint64) (x unsafe.Pointer) {
	if val == 0 {
		x = unsafe.Pointer(&zeroVal[0])
	} else {
		x = mallocgc(8, uint64Type, false)
		*(*uint64)(x) = val
	}
	return
}

To summarise, the combination of the following reasons makes Go unsuitable for
construction of reliable system software:

it is not, in general, possible to guarantee that memory allocation would always succeed. For example, in the [LIBRARY] case, other parts of the process or other processes can exhaust all the available memory. Pre-allocating memory for the worst case is impractical except in the simplest cases;
due to the design of Go runtime and the implementation of the fundamental language features like interfaces, it is not possible to reliably check for memory allocation errors;
software that can neither prevent nor resolve memory allocation errors is unreliable. For example, a library that when called crashes the entire process, because some other process allocated all available memory cannot be used to build reliable software on top of it.

tag:blogger.com,1999:blog-5799246.post-941689921764058524

Extensions

Cographs, cocounts and coconuts.

nikita Mar 9, 2018 Updated Jun 30, 2024

Show full content

Abstract: Dual of the familiar construction of the graph of a function is considered. The symmetry between graphs and cographs can be extended to a suprising degree.

Given a function $f : A \rightarrow B$, the graph of f is defined as $$f^* = \{(x, f(x)) \mid x \in A\}.$$ In fact, within ZFC framework, functions are defined as graphs. A graph is a subset of the Cartesian product $A \times B$. One might want to associate to $f$ a dual cograph object: a certain quotient set of the disjoint sum $A \sqcup B$, which would uniquely identify the function. To understand the structure of the cograph, define the graph of a morphism $f : A \rightarrow B$ in a category with suitable products as a fibred product:$\require{AMScd}$\begin{CD}
f^* @>\pi_2>> B\\
@V \pi_1 V V @VV 1_B V\\
A @>>f> B
\end{CD}In the category of sets this gives the standard definition. The cograph can be defined by a dual construction as a push-out:
\begin{CD}
A @>1_A>> A\\
@V f V V @VV j_1 V\\
B @>>j_2> f_*
\end{CD}Expanding this in the category of sets gives the following definition:
$$f_* = (A \sqcup B) / \pi_f,$$
where $\pi_f$ is the reflexive transitive closure of a relation $\theta_f$ given by (assuming in the following, without loss of generality, that $A$ and $B$ are disjoint)
$$x\overset{\theta_f}{\sim} y \equiv y = f(x)$$
That is, $A$ is partitioned by $\pi_f$ into subsets which are inverse images of elements of $B$ and to each such subset the element of $B$ which is its image is glued. This is somewhat similar to the mapping cylinder construction in topology. Some similarities between graphs and cographs are immediately obvious. For graphs: $$\forall x\in A\; \exists! y\in B\; (x, y)\in f^*$$ $$f(x) = \pi_2((\{x\}\times B) \cap f^*)$$ $$f(U) = \{y \mid y = f(x) \wedge x\in U \} = \pi_2((U\times B)\cap f^*)$$ where $x\in A$ and $U\subseteq A$. Similarly, for cographs: $$\forall x\in A\; \exists! y\in B\; [x] = [y]$$ $$f(x) = [x] \cap B$$ $$f(U) = (\bigcup [U])\cap B$$ where $[x]$ is the equivalance set of $x$ w.r.t. $\pi_f$ and $[U] = \{[x] \mid x \in U\}$. For inverse images: $$f^{-1}(y) = \pi_1((A \times \{y\}) \cap f^*) = [y] \cap A$$ $$f^{-1}(S) = \pi_1((A \times S) \cap f^*) = (\bigcup [S])\cap A$$ where $y\in B$ and $S\subseteq B$.

A graph can be expressed as $$f^* = \bigcup_{x \in A}(x, f(x))$$ To write out a similar representation of a cograph, we have to recall some elementary facts about equivalence relations.

Given a set $A$, let $Eq(A) \subseteq Rel(A) = P(A \times A)$ be the set of equivalence relations on $A$. For a relation $\pi \subseteq A \times A$, we have $$\pi \in Eq(A) \;\; \equiv \;\; \pi^0 = \Delta \subseteq \pi \; \wedge \; \pi^n = \pi, n \in \mathbb{Z}, n \neq 0.$$ To each $\pi$ corresponds a surjection $A \twoheadrightarrow A/\pi$. Assuming axiom of choice (in the form "all epics split"), an endomorphism $A \twoheadrightarrow A/\pi \rightarrowtail A$ can be assigned (non-uniquely) to $\pi$. It is easy to check, that this gives $Eq(A) = End(A) / Aut(A)$, where $End(A)$ and $Aut(A)$ are the monoid and the group of set endomorphisms and automorphisms respectively, with composition as the operation ($Aut(A)$ is not, in general, normal in $End(A)$, so $Eq(A)$ does not inherit any useful operation from this quotient set representation.). In addition to the monoid structure (w.r.t. composition) that $Eq(A)$ inherits from $Rel(A)$, it is also a lattice with infimum and supremum given by $$\pi \wedge \rho = \pi \cap \rho$$ $$\pi \vee \rho = \mathtt{tr}(\pi \cup \rho) = \bigcup_{n \in \mathbb{N}}(\pi \cup \rho)^n$$ For a subset $X\subseteq A$ define an equivalence relation $e(X) = \Delta_A \cup (X\times X)$, so that $$x\overset{e(X)}{\sim} y \equiv x = y \vee \{x, y\} \subseteq X$$ (Intuitively, $e(X)$ collapses $X$ to one point.) It is easy to check that $$f_* = \bigvee_{x \in A}e(\{x, f(x)\})$$ which is the desired dual of the graph representation above.

tag:blogger.com,1999:blog-5799246.post-5117780489698465476

Extensions

Conatus trimetro iambico.

nikita Dec 18, 2017 Updated Dec 18, 2017

Show full content

Aloof as stardust rains Are memory dim prints Eliding tensed face By shadows within of All conquering space that Inly trusts each friend Of madness—whose embrace Accept with no delays To take one final look Before you turn Rose ways.
A rather rare meter in English, but much easier once you let enjambments in.

tag:blogger.com,1999:blog-5799246.post-2640334627793564003

Extensions

Translations.

nikita Sep 22, 2017 Updated Sep 22, 2017

Show full content

Licence my roving hands, and let them go,
Before, behind, between, above, below.
O my America! my new-found-land
-- J. Donne, 1633.

Блуждающим рукам моим дай разрешенье,
Пусти вперед, назад, промеж, и вверх, и вниз,
О дивный новый мир, Америка моя!

Variant reading reeing instead of roving is even better.

I hope that "О дивный новый мир" (O brave new world) is not entirely anachronistic.

tag:blogger.com,1999:blog-5799246.post-5439879540958462248

Extensions

Zero everywhere.

nikita Dec 13, 2014 Updated Dec 17, 2014

Show full content

(студенческая олимпиада МФТИ по математике, 2013, задача 3)
Предположим, что $\forall x\in\mathbb{R}\to f(x) \neq 0$. Возьмем произвольный интервал $[a, b] \subset \mathbb{R}$, $a \lt b$ и докажем, что на этом интервале есть точка $x_0$ такая, что $\lim \limits_{x\to x_0} f(x) \neq 0$.

Пусть $T_n = |f|^{-1}([\frac{1}{n}, +\infty)) \cap [a, b]$, т.е. $x \in T_n \equiv |f(x)| \ge \frac{1}{n} \wedge x\in[a, b]$, для $n > 0$.

Если каждое $T_n$ нигде неплотно, нигде неплотно и их объединение, (т.к. $[a, b]$ — бэровское пространство), но их объединение это весь интервал $[a, b]$ — противоречие. Следовательно, некоторое $T_n$ имеет внутреннюю точку, $x_0 \in T_n$, тогда $T_n$ содержит $x_0$ вместе с неким открытым интервалом на котором, таким образом, $|f(x)| ≥ \frac{1}{n}$, и, следовательно, $|\lim \limits_{x\to x_0} f(x)| \ge \frac{1}{n} \gt 0$.

Заметим, что мы доказали больше, чем требовалось, а именно, что множество нулей всюду плотно. Или, что функция всюду сходящаяся к непрерывной, почти всюду непрерывна (замените $0$ на произвольную непрерывную $g:\mathbb{R}\to\mathbb{R}$).

(2014.12.15) Обобщение.

Let $X$ be a Baire space, $Y$—a second-countable Urysohn space and $f,g : X \to Y$—arbitrary maps. If $(\forall x\in X)(\lim\limits_{t\to x}f(t) = \lim\limits_{t\to x}g(t))$ then $f = g$ on a dense subset.

Proof (by contraposition). Suppose that there is an open $A \subseteq X$, such that $(\forall x\in A)(f(x)\neq g(x))$. Let $B$ be a countable base of $Y$.

Define a countable family of subsets of $A$: $T_{U,V} = f^{-1}(U) \cap g^{-1}(V) \cap A$, where $U, V \in B$ (that is, $U$ and $V$ are open subsets of $Y$). For any $x\in A$, $f(x)\neq g(x)$, and because $Y$ is Urysohn, there are $U, V\in B, cl(U)\cap cl(V) = \varnothing, f(x)\in U, g(x)\in V$, that is, $x\in T_{U,V}$ that is,

$\bigcup\limits_{cl(U)\cap cl(V) = \varnothing} T_{U,V} = A$

Because $X$ and hence $A$ are Baire spaces, one of $T_{U,V}$ in the union above, say $T_{U_0, V_0}$ is not nowhere dense. That is, there is an open set $G\subseteq A$ such that for any $x_0\in G$, and any open neighbourhood $S, x_0\in S$, $S \cap T_{U_0,V_0}\neq\varnothing$, that is there exists a $x'\in S$ such that $f(x') \in U_0$ and $g(x')\in V_0$.

Suppose that $\lim\limits_{t\to x_0}f(t) = \lim\limits_{t\to x_0}g(t) = y\in Y$. This means that every open neighbourhood of $y$ intersects with $U_0$, hence $y\in cl(U_0)$. Similarly $y\in cl(V_0)$, contradiction with $cl(U_0)\cap cl(V_0) = \varnothing$.

PS: для задачи 2, ответ $k = 2\cdot n - 2$.

tag:blogger.com,1999:blog-5799246.post-8848250712766671498

Extensions

Blindsight spot.

nikita Oct 16, 2014 Updated Oct 16, 2014

Show full content

I just realised that in any unital ring commutativity of addition follows from distributivity:

$a + b$ $=$$-a + a + a + b + b - b$ $=$$-a + a\cdot(1 + 1) + b\cdot(1 + 1) - b$ $=$$-a + (a + b)\cdot(1 + 1) - b$ $=$$-a + (a + b)\cdot 1 + (a + b)\cdot 1 - b$ $=$$-a + a + b + a + b - b$ $=$$b + a$
The same holds for unital modules, algebras, vector spaces, &c. Note that multiplication doesn't even need to be associative. It's amazing how such things can pass unnoticed.

tag:blogger.com,1999:blog-5799246.post-4046949461758293999

Extensions

Whither science (А болт мы сделаем деревянным).

nikita May 28, 2014 Updated May 28, 2014

Show full content

I was recently on a committee, reviewing papers for a computer science conference. The submissions, from all over the world, were in PDF. One submission was actually a Trojan. An executable masking as a PDF file. This by itself is a startling evidence of how widespread spying became (the Trojan came from a respectable university email address), but the really alarming thing is that this was *not* the paper that got the lowest grades. That is, some legitimate submissions have less scientific content than a virus.
A few days later, while perusing proceedings of another conference, which shall remain nameless only for so long, I found something that triggered this post.

A conference on distributed computing must be interesting. And indeed, for an attentive reader, persistent enough to reach the last articles, there is a treasure in store. An impressive proof of how versatile and far-reaching distributed computing is nowadays.

And who do you think publishes this?

Oh, pity.

tag:blogger.com,1999:blog-5799246.post-9050772998959191250

Extensions

Parmigianino's faces

nikita Feb 9, 2014 Updated Feb 10, 2014

Show full content

An angel from Madonna with the long neck (1535):

An unidentified girl from Antea (1524, survived everything, including Austrian salt mines):

Albeit one can argue that the resemblance is due to the stricture of the mannerist canon (see the earlobes, for example), this is undoubtedly the same face.

A discovery no less thrilling even though I am definitely not the first to make it.

tag:blogger.com,1999:blog-5799246.post-6204375627700056093

Extensions

threads, contexts and doors.

nikita Dec 16, 2013 Updated Dec 16, 2013

Show full content

Paul Turner gave a talk about new threading interface, designed by Google, at this year Linux Plumbers Conference:

The idea is, very roughly, to implement the ucontext interface with kernel support. This gives the benefits of kernel threads (e.g., SMP support), while retaining fast context switches of user threads. switchto_switch(tid) call hands off control to the specified thread without going through the kernel scheduler. This is like swapcontext(3), except that kernel stack pointer is switched too. Of course, there is still an overhead of the system call and return, but it is not as important as it used to be: the cost of normal context switch is dominated by the scheduler invocation (with all the associated locking), plus, things like TLB flushes drive the difference between user and kernel context switching further down.

I did something similar (but much more primitive) in 2001. The difference was that in that old implementation, one could switch context with a thread running in a different address space, so it was possible to make a "continuation call" to another process. This was done to implement Solaris doors RPC mechanism on Linux. Because this is an RPC mechanism, arguments have to be passed, so each context switch also performed a little dance to copy arguments between address spaces.

tag:blogger.com,1999:blog-5799246.post-5309542368115495079

Extensions

Weekend affinity.

nikita Jan 19, 2013 Updated Jul 24, 2013

Show full content

Groups usually come with homomorphisms, defined as mappings preserving multiplication:
$$f(a\cdot b) = f(a)\cdot f(b)$$
From this definition, notions of subgroup (monomorphism), quotient group (epimorphism, normal subgroup) and the famous isomorphism theorem follow naturally. The category of groups with homomorphisms as arrows has products and sums, equalizers and coequalizers all well-known and with nice properties.

Consider, instead, affine morphism, that can be defined by the following equivalent conditions:

$f(a \cdot b^{-1} \cdot c) = f(a) \cdot f^{-1}(b) \cdot f(c)$
$f(a \cdot b) = f(a) \cdot f^{-1}(e) \cdot f(b)$
$\exists t. f(a \cdot b) = f(a) \cdot t \cdot f(b)$

The motivation for this definition is slightly roundabout.

The difference between homomorphism and affine morphism is similar to the difference between a vector subspace and an affine subspace of a vector space. A vector subspace always goes through the origin (for a homomorphism $f$, $f(e) = e$), whereas an affine subspace is translated from the origin ($f(e) \neq e$ is possible for an affine morphism).

Take points $f(a)$ and $f(b)$ in the image of an affine morphism, translate them back to the corresponding "vector subspace" to obtain $f(a) \cdot f^{-1}(e)$ and $f(b) \cdot f^{-1}(e)$. If translated points are multiplied and the result is translated back to the affine image, the resulting point should be the same as $f(a \cdot b)$:
$$
f(a \cdot b) = (f(a) \cdot f^{-1}(e)) \cdot (f(b) \cdot f^{-1}(e)) \cdot f(e) = f(a) \cdot f^{-1}(e) \cdot f(b)
$$
which gives the definition (2).

(1) => (2) immediately follows by substituting $e$ for $b$.
(2) => (3) by substituting $f^{-1}(e)$ for $t$.
(3) => (2) by substituting $e$ for $a$ and $b$.
(2) => (1)

$f(a \cdot b^{-1} \cdot c)$ $=${ (2) for $a \cdot (b^{-1} \cdot c)$ } $f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot c)$ $=${ (2) for $b^{-1} \cdot c$ } $f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(c)$ $=${ $e = f(b) \cdot f^{-1}(b)$, working toward creating a sub-expression that can be collapsed by (2) } $f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(b) \cdot f^{-1}(b) \cdot f(c)$ $=${ collapsing $f(b^{-1}) \cdot f^{-1}(e) \cdot f(b)$ by (2) } $f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot b) \cdot f^{-1}(b) \cdot f(c)$ $=${ $b^{-1} \cdot b = e$ } $f(a) \cdot f^{-1}(e) \cdot f(e) \cdot f^{-1}(b) \cdot f(c)$ $=${ $f^{-1}(e) \cdot f(e) = e$ } $f(a) \cdot f^{-1}(b) \cdot f(c)$
It is easy to check that each homomorphism is an affine morphism (specifically, homomorphisms are exactly affine morphisms with $f(e) = e$).

Composition of affine morphisms is affine and hence groups with affine morphisms form a category $\mathbb{Aff}$.

A subset of a group $G$ is called an affine subgroup of $G$ if one of the following equivalent conditions holds:

$\exists h \in G:\forall p, q \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)$
$\forall p, q, h \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)$

The equivalence (with a proof left as an exercise) means that if any $h$ "translating" affine subgroup to a subgroup exists, then any member of the affine subgroup can be used for translation. In fact, any $h$ that satisfies (1) belongs to $H$ (prove). This matches the situation with affine and vector subspaces: any vector from an affine subspace translates this subspace to the subspace passing through the origin.

Finally for to-day, consider an affine morphism $f:G_0\rightarrow G_1$. For $t\in G_0$ define kernel:
$$ker_t f = \{g\in G_0 | f(g) = f(t)\}$$
It's easy to check that a kernel is affine subgroup (take $t$ as $h$). Note that in $\mathbb{Aff}$ a whole family of subobjects corresponds to a morphism, whereas there is the kernel in $\mathbb{Grp}$.

To be continued: affine quotients, products, sums, free affine groups.

tag:blogger.com,1999:blog-5799246.post-1053915103857021288

Extensions