GeistHaus
log in · sign up

Capgemini Software Engineering

Part of Capgemini Software Engineering

stories primary
Functional Pipelines
ProgrammingRustFunctional programmingLinuxLearning
Show full content

One of the major shifts in developer ergonomics has been the introduction of functional pipelines to most popular languages. A pipeline is a chain of operations on a set of data that transforms the data into another shape or value. It is made functional by using lambda functions as the primary way of specifying behavior.

git ls-files | grep -o "\.\w*$" | sort | uniq -c

Linux has always been built around this idea; programs accept text input and produce text output. This allows you to chain programs together with pipes (|). The above example finds all tracked files in a git repo, then matches just the file extension (for example .xml), then sorts them, and finally removes duplicates and counts each unique string, resulting in a list of file extensions and how many files have that file extension.

❯ git ls-files | grep -o "\.\w*$" | sort | uniq -c
      9 .css
      1 .gitignore
      1 .ico
      1 .jpg
      8 .json
     16 .md
     24 .mp4
     32 .png
      2 .scss
      2 .svg
      3 .ts
     14 .vue
      1 .webmanifest
     42 .woff2

Today, most programming languages allow you to operate on collections of data in a similar way.

const array = ['Peter', 'John', 'Doyle', 'Aron'];
const result = array
	.filter(name => !name.startsWith('A'))
	.map(name => name.toLowerCase())
	.sort();
console.log(result);

This is fast and easy to write, but it isn’t particularly efficient. In JavaScript, each pipeline operation is executed sequentially. First, .filter() allocates a new empty array, then loops through the old array and executes the predicate (the callback function) for each element. It aggregates each matching element into the new array, and returns it. Then .map() does the same. It allocates a new array, loops over the old array, executes the predicate for each element and inserts the transformed value into the new array. Finally .sort() actually doesn’t allocate a new array, but sorts the previous array in place. This is the only one that doesn’t allocate a new array, but this can cause confusion for developers. const sortedArray = unsortedArray.sort() will result in both sortedArray and unsortedArray being sorted, while map and filter do not have the same effect. Surely there is a better way to handle this?

Java solves the reallocation problem in their Stream<T> interface.

int sum = widgets.stream()
	.filter(w -> w.getColor() == RED)
	.mapToInt(w -> w.getWeight())
	.sum();

This looks very similar, but the crucial difference here is that Java stream operations are not executed immediately. They wait until a “terminal operation” such as .sum() or .collect() before executing. This allows Java to perform multiple operations in the same iteration of the loop. Both filter and mapToInt are performed at the same time, equivalent to the following example:

int sum = 0;
for (Widget w : widgets) {
	// Filter
	if (w.getColor() == RED) {
		// Map to int
		int weight = w.getWeight();
		
		// Sum
		sum += weight;
	}
}

Java and C# compile to bytecode or Intermediate Representation; essentially portable formats which are executed in virtual environments such as the JVM (Java) or the CLR (.NET). These languages rely heavily on JIT (Just-In-Time) compilation to optimize pipelines. Rust solves this with their iterators, which are optimized at compile-time, often into machine code which is as fast as (or faster than) the equivalent in for loops.

let result: u32 = widgets.iter()
	.filter(|w| w.color == Color.Red)
	.map(|w| w.weight)
	.sum();

Rust does something even smarter here too. .iter() will cause these operations to work on a clone of widgets so as to not affect the original array, but if you use .into_iter(), the widgets array will be “consumed” and the collection may be operated on in place. This gives the developers more fine-grained control over performance. Of course, you won’t be allowed to use the widgets array after consuming it with .into_iter(), because that would cause the same confusion that JavaScript .filter() does.

Another difference is that Rust iterators can be operated manually and paused at will since an iterator is really just a struct with a pointer and an offset.

let mut iter: u32 = widgets.iter()
	.filter(|w| w.color == Color.Red)
	.map(|w| w.weight);

// Capture the first element (only possible if iterator is mutable)
let first = iter.next();

// Prints from 2nd element to the end
for weight in iter {
	println!("{}", weight);
}

Use these pipelines with care. Remember the performance implications, and beware that long method chains can become hard to read and even harder to edit. Sometimes procedural is best!

Functional Pipelines was originally published by Capgemini at Capgemini Software Engineering on April 09, 2026.

https://capgemini.github.io/development/functional-pipelines
Theory and Memory: Two Forces Shaping Software Team Knowledge
Knowledge managementTeam dynamicsAILearningCognitive science
Show full content

How insights from cognitive science and social psychology explain why software knowledge is so hard to preserve

Summary

Software teams face a knowledge problem that no amount of documentation seems to solve. When experienced developers leave, something valuable departs with them — something that comprehensive handover documents and well-commented code fail to capture. Two researchers from the 1980s provided frameworks that, when combined, explain this phenomenon with unusual clarity.

Peter Naur argued that programming is fundamentally theory-building1: the real product of software development is the understanding in developers’ minds, not the code they produce. Daniel Wegner demonstrated that effective teams develop transactive memory systems2 — distributed knowledge structures where members know “who knows what” rather than everyone knowing everything.

These frameworks operate at different levels — individual and team — and their integration reveals that software teams face a dual challenge. They must cultivate deep individual understanding whilst building effective coordination mechanisms. Failure can occur through either path, with distinct symptoms and remedies.

Key Insights:

  • Code is an artefact of understanding, not understanding itself
  • Teams coordinate access to knowledge but cannot substitute for it
  • “Program death” and “coordination breakdown” are distinct failure modes
  • Conversation is the mechanism that builds both individual theory and team memory
  • AI shifts cognitive load from generation to evaluation, making deliberate theory-building essential
  • AI can augment team retrieval but cannot participate in the social dynamics of trust and coordination
1. The Theory in Developers’ Minds Explanation

When we look at code, we might assume we’re seeing the software itself. But code is more like sheet music than music. Sheet music isn’t the music — it’s instructions for creating music, reflecting the composer’s understanding of harmony, structure, and emotion. Similarly, code reflects a developer’s understanding of how to solve a problem. The understanding itself lives in the development team’s collective mental model: what the system is supposed to do, what it actually does, why it does it that way, and how all the pieces fit together.

This explains why a team can inherit “perfectly good code” and still struggle to work with it. They’ve received the sheet music but not the understanding of how to perform it. It explains why developers spend considerable time in meetings drawing diagrams and discussing approaches rather than typing. The challenge isn’t writing code; it’s building and sharing the mental theory of what the software should do and how it should work.

When developers say they need to “understand the codebase,” they’re not memorising code — they’re reconstructing the original thinking behind it.

Theoretical Foundation

Peter Naur’s 1985 paper “Programming as Theory Building”1 established that programming consists of building and maintaining a theory about how computational processes solve real-world problems. Drawing on Gilbert Ryle’s philosophical work on knowledge2, Naur distinguished between different forms of understanding:

Form Description Example Knowing that Factual knowledge “This function returns a boolean” Knowing how Procedural skill Being able to write working code Theory Integrated understanding enabling explanation, justification, and appropriate modification Understanding why the system is structured this way and how it should evolve

A developer possessing a program’s theory can:

  1. Explain how the solution relates to the real-world problem it addresses
  2. Justify why each part is designed the way it is
  3. Respond constructively to modification demands

Crucially, Naur argued this theory “necessarily, and in an essential manner, transcends that which is recorded in the documented products.”1 The theory cannot be fully captured in code or documentation because it includes tacit understanding — what Michael Polanyi described as knowledge we possess but cannot fully articulate3.

The Concept of Program Death

Naur’s most striking claim concerns what happens when theory-holders leave: the program effectively “dies” — even while the code continues executing.

This doesn’t mean the software stops working. It means the software can no longer be intelligently modified. New developers can make the code do different things, but without the theory, they work against the system’s inherent logic rather than with it. Each modification that fights the original design makes subsequent modifications harder. This is why systems become “legacy” — the theory dies while code survives.

We’ve all encountered this: a module everyone’s afraid to touch, a service that works but nobody truly understands, code that deteriorates with every “fix.” These are programs whose theories have died.

2. The Directory in Teams’ Minds Explanation

Consider how couples often divide cognitive labour. One partner remembers birthdays; the other remembers directions. Crucially, both know who knows what. They maintain a mental directory of expertise locations rather than each storing everything.

This pattern scales to teams. Effective groups develop collective memory capabilities exceeding any individual’s capacity — not by everyone knowing everything, but by everyone knowing who knows what. “Alice understands the authentication module” doesn’t mean we understand authentication. It means we know where to find understanding.

Theoretical Foundation

Daniel Wegner’s research on transactive memory systems (TMS)4 identified three components that enable distributed team knowledge:

Component Function Software Team Example Specialisation Differentiated expertise distribution “Alice knows auth, Bob knows payments” Credibility Beliefs about members’ knowledge reliability “Alice is the real expert on auth” Coordination Orchestrated knowledge retrieval “When auth breaks, we ask Alice”

The critical mechanism is the directory — members store labels and pointers to expertise rather than all content. This enables teams to function effectively despite no single person holding complete understanding.

Subsequent research validated this framework in software contexts. Faraj and Sproull’s study of 69 software development teams5 found that expertise coordination strongly predicted performance — “mere presence of expertise is insufficient.” The expertise must be locatable and accessible.

TMS Vulnerability

Transactive memory creates capability but also vulnerability. The directory can become inaccurate. Key experts can depart. Communication breakdowns can prevent retrieval. Unlike individual memory, which degrades gradually, TMS can fail suddenly when critical nodes disappear or connections break.

3. Integration: Two Levels, Two Failure Modes Explanation

At first glance, these frameworks appear to be in tension. Naur seems pessimistic — theory can’t be transferred, programs die. Wegner seems optimistic — teams achieve collective capability through coordination.

The resolution: they operate at different levels of analysis.

Level Framework Core Question Individual Naur’s Theory-Building What constitutes genuine understanding? Team Wegner’s TMS How do groups coordinate distributed expertise?

Naur asks whether adequate understanding exists anywhere. Wegner asks how existing understanding is coordinated. Both observations can be true simultaneously: software teams demonstrably develop transactive memory, and software maintenance demonstrably suffers when experts depart.

The Relationship Between Theory and TMS

TMS coordinates access to theory but cannot substitute for it. The directory contains metadata — labels and locations — not content. Knowing that “Alice understands auth” doesn’t give us Alice’s understanding. We gain retrieval capability, not possession.

This means TMS presupposes individual expertise to function. There must be something to coordinate. Naur’s “theory” represents the individual expertise that TMS coordinates.

Two Distinct Failure Modes

This integration reveals that software teams can fail through either mechanism:

Theory Death (Naur’s concern): The experts leave. Even if the TMS structure remains intact — people remember “who knew what” — the expertise itself is gone. New members cannot simply consult someone else because no one holds the theory anymore.

TMS Degradation (Wegner’s concern): The experts remain but coordination breaks down. Reorganisations, rapid turnover, remote work transitions, or communication breakdown disrupt the directory. Individual expertise exists somewhere, but the collective loses awareness of where.

Failure Mode Symptom Remedy Theory Death “No one really understands this anymore” Rebuild understanding through deep engagement TMS Degradation “The expertise is here somewhere” Rebuild visibility and coordination structures

A project might suffer either failure independently, or both simultaneously. Distinguishing which failure mode is occurring determines the appropriate response.

4. Conversation as the Mechanism Explanation

A critical insight often remains implicit: conversation is not merely a byproduct of theory-building and TMS — it is the primary mechanism through which both operate.

When developers explain code, they’re articulating theory (making tacit understanding explicit) whilst simultaneously updating the team directory (“I know this”). When they ask questions, they’re building theory whilst executing retrieval operations. Design discussions build rationale whilst negotiating who will own what knowledge.

Conversation Type Theory-Building Function TMS Function Explaining code Articulating theory, making tacit explicit Directory update (“I know this”) Asking questions Seeking understanding, building theory Retrieval operation Design discussions Developing rationale, deepening theory Encoding negotiation Code review Validating understanding, refining theory Credibility assessment Discussing errors Learning, correcting mental models Directory maintenance Why Certain Practices Work

This dual function explains why certain practices show benefits across multiple dimensions simultaneously.

Pair programming operates on both levels: knowledge transfers between partners (building theory) whilst pairs develop mutual awareness of each other’s understanding (building TMS). When pairs rotate, both effects spread through the team.

Code review creates opportunities for theory articulation (the author explains their reasoning) and directory updates (reviewers learn who understands what). Design discussions build shared rationale whilst establishing who owns which decisions.

Teams that “just talk” more than others aren’t wasting time — they’re building the invisible infrastructure that makes everything else work.

5. How AI Changes the Picture The Theory-Building Question

When AI generates code, we face a question Naur couldn’t have anticipated: whose theory is being built?

The traditional workflow of “think deeply, then implement” meant theory-building was almost unavoidable. Writing code forced engagement with the problem. The act of implementation — choosing variable names, structuring functions, handling edge cases — built understanding whether we intended it or not.

AI inverts this relationship. Generation becomes fast; code appears without the developer having worked through the reasoning. The code might be correct, but we don’t necessarily understand why it’s correct, what alternatives were considered, or how it should change when requirements shift.

This doesn’t mean AI-generated code is theory-free. The AI produces code that reflects patterns learned from millions of examples — a kind of statistical crystallisation of collective practice. But that’s different from the developer holding a theory about this specific program in this specific context. The code exists; the theory doesn’t automatically transfer.

This suggests that theory-building in AI-augmented development requires conscious attention in ways it perhaps didn’t before. We can’t rely on writing code to build theory, so we must deliberately construct understanding of what was generated. Reviewing AI outputs, questioning decisions, iterating on designs — these activities build theory even when AI generates the initial artefacts.

The Shift from Generation to Evaluation

Previously, the cognitive work concentrated in generation: thinking through the problem, designing an approach, implementing it carefully. Evaluation mattered but was often less demanding than the work of creation.

AI redistributes this load. Generation becomes fast whilst evaluation becomes the bottleneck. We’re now assessing whether solutions reflect good thinking about the problem, forming judgements about work we didn’t create, often whilst our mental model is still developing.

This shift has implications for both Naur and Wegner’s frameworks:

Dimension Traditional Development AI-Augmented Development Theory-building Emerges through implementation Requires deliberate construction Cognitive load Front-loaded in generation Shifted to evaluation Understanding pace Builds incrementally as code is written Must be constructed after generation Quality signal “Does this work?” “Does this work, and do I understand why?”

The critical thinking skills — interpretation, analysis, evaluation — operate more continuously when we don’t control the pace of generation. We’re no longer building understanding step-by-step as we write; we’re constructing understanding of something that appeared whole.

The TMS Asymmetry

Human-AI collaboration introduces a structural asymmetry not present in human teams: AI context is ephemeral.

In human teams, conversation builds TMS and TMS state persists beyond conversations. Team members remember what was discussed, what was decided, who knows what. This persistence creates a continuously updated directory that enables efficient coordination.

In human-AI collaboration, conversation still performs TMS operations in real-time. But the AI’s contributions are ephemeral. Session end means the AI “forgets” — TMS degradation from the AI’s perspective.

TMS Dimension Human-Human Teams Human-AI Collaboration Theory-building Both partners build theory over time Human builds; AI generates without theory Directory knowledge Both persist “who knows what” Human persists; AI forgets Encoding agreements Both remember allocation Human remembers; AI must be re-informed Credibility assessment Builds through track record Must be re-established or externally encoded

This creates a structural problem: the human partner must either re-establish context at every session (costly, error-prone) or find ways to externalise state so the AI can rejoin the team’s transactive memory.

AI as TMS Participant — Or Tool?

There’s a deeper question about AI’s role in the team knowledge structure. Is AI a TMS participant — a member of the distributed memory system — or a sophisticated retrieval tool that humans query?

Wegner’s TMS involves social dynamics: credibility built through relationship, encoding responsibilities negotiated through conversation, retrieval initiated by recognising one’s own knowledge gaps. AI participates asymmetrically in these processes. It can be queried as a knowledge source, but it doesn’t build credibility through track record in the way humans do. It doesn’t negotiate encoding responsibilities or make social judgement calls about when to defer.

Perhaps the most useful framing: AI can serve as an external extension of team TMS — a queryable knowledge system that augments what humans can locate and retrieve — without being a full participant in the social memory system. This suggests that human TMS remains essential for coordination involving trust, judgement, and contextual fit, even as AI enhances raw retrieval capability.

Externalisation as Mitigation

Decision logs, session summaries, and context documents take on new importance in this framing. They’re not bureaucracy — they’re externalised TMS transactions that enable AI to rejoin the team’s memory system.

Externalisation Practice TMS Function Decision logs Externalise encoding/allocation and rationale Observation records Externalise directory updates Session summaries Compress TMS state for retrieval Context documents Encode standing TMS knowledge

Without externalisation, each AI session starts with zero TMS context. The AI cannot know who knows what, what was decided, or what was observed. Externalisation enables continuity despite ephemeral context.

This explains why “just use AI” often disappoints. The AI can generate code, but without the surrounding TMS infrastructure — the context, decisions, and rationale — it generates in a vacuum. The output may be technically correct but contextually inappropriate, because the AI lacks access to the team’s accumulated understanding.

6. Practical Implications For Team Organisation
  • Redundancy plus visibility: Ensure multiple people understand each critical component (redundant theory) AND that everyone knows who those people are (complete TMS)
  • Stability matters: Both theory-building and TMS development require time and shared experience
  • Communication structure: Face-to-face communication uniquely enables TMS emergence; distributed teams face inherent disadvantages requiring deliberate mitigation
  • Team size: TMS effectiveness may degrade with size; consider whether coordination costs outweigh capability gains
For Knowledge Management
  • Documentation purpose: Documentation cannot substitute for theory but can scaffold theory-building — focus on capturing why decisions were made, not just what exists
  • Expertise visibility: Maintain visible expertise directories; make it easy to know who knows what
  • Onboarding targets: New members must build individual theory AND integrate into team TMS — this dual target explains extended onboarding timelines (6-9 months for autonomy)
For AI Integration
  • Theory still required: AI generates artefacts, but humans must still build theory about those artefacts to maintain and modify systems intelligently. Code you didn’t write is code you don’t yet understand.
  • Deliberate understanding: When AI generates substantial portions of a system, theory-building becomes a conscious practice rather than an automatic byproduct of implementation. Budget time for understanding, not just generation.
  • Externalise deliberately: The ephemeral context problem is real — decisions, rationale, and context must be captured externally to enable AI participation across sessions.
  • Shifted balance: AI improves retrieval whilst making theory-building more important, not less. The bottleneck moves from “can we build this?” to “do we understand what we’ve built?”
  • TMS infrastructure matters: AI without context generates in a vacuum. The value of AI assistance scales with the quality of the surrounding knowledge infrastructure.
For Legacy Systems
  • Double deficit: New teams inheriting systems face a dual challenge — they must build individual theory AND establish TMS, but without access to former experts, both processes are impaired
  • “Just read the code” fails: This approach addresses neither theory-building nor TMS integration
Conclusion

Naur’s pessimism about knowledge transfer and Wegner’s optimism about collective memory reflect different aspects of the same phenomenon.

Individual understanding is indeed hard to transfer — Naur was right about this. The theory in developers’ minds transcends what documentation can capture. Yet groups demonstrably achieve collective capability through coordination — Wegner was right about this too. Teams develop distributed knowledge structures that exceed any individual’s capacity.

Software development requires both: individuals who understand deeply, and teams who coordinate effectively. Optimising for one whilst neglecting the other explains why many interventions fail. Hiring brilliant individuals doesn’t help if they can’t coordinate. Building elaborate knowledge management systems doesn’t help if no one holds the underlying expertise.

AI amplifies both the opportunity and the risk. It accelerates generation whilst making deliberate understanding more necessary. It can augment team retrieval whilst remaining unable to participate in the social dynamics of trust and judgement. It produces artefacts at scale whilst leaving theory-building as fundamentally human work.

Perhaps the core insight is this: AI is powerful not because it replaces human understanding, but because it creates conditions where high-quality human understanding becomes more valuable6. The question isn’t whether to use AI — it’s whether we’re deliberate about building and maintaining the theory and coordination structures that make AI assistance meaningful.

The theory that lives in developers’ minds matters. The directory that lives in teams’ minds matters. And the conversations that build both aren’t overhead — they are the work.

References
  1. Naur, P. (1985). Programming as Theory Building. Microprocessing and Microprogramming, 15(5), 253-261.  ↩2 ↩3

  2. Ryle, G. (1949). The Concept of Mind. Hutchinson.  ↩2

  3. Polanyi, M. (1966). The Tacit Dimension. University of Chicago Press. 

  4. Wegner, D. M. (1986). Transactive Memory: A Contemporary Analysis of the Group Mind. In B. Mullen & G. R. Goethals (Eds.), Theories of Group Behavior (pp. 185-208). Springer-Verlag. 

  5. Faraj, S., & Sproull, L. (2000). Coordinating Expertise in Software Development Teams. Management Science, 46(12), 1554-1568. 

  6. Williams, S. (2025). Critical Thinking in the Age of AI. Capgemini Engineering Blog. https://capgemini.github.io/engineering/critical-thinking-age-of-ai/ 

Theory and Memory: Two Forces Shaping Software Team Knowledge was originally published by Capgemini at Capgemini Software Engineering on March 20, 2026.

https://capgemini.github.io/development/theory-and-memory
Critical Thinking in the Age of AI
AICritical ThinkingDevelopmentSoftware EngineeringBest Practices
Show full content
What Is Critical Thinking?

Critical thinking emerged as a formal area of study from multiple traditions. The most influential modern definition comes from the 1990 Facione Delphi study 1, where expert philosophers and educators reached consensus on what constitutes critical thinking - no small feat in philosophy. This definition has been widely adopted in education and professional development. They arrived at this:

We understand critical thinking to be purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based.

That might sound abstract, but it maps surprisingly well to what we already do in software development. Consider how the skills involved in critical thinking align with common software engineering tasks:

Core cognitive skills
  • Interpretation - understanding what code, requirements, or error messages actually mean
  • Analysis - examining the structure of arguments in RFCs, identifying assumptions in designs
  • Evaluation - assessing whether evidence supports claims, whether benchmarks are meaningful
  • Inference - reasoning about consequences, exploring what could go wrong
  • Explanation - articulating why we made particular decisions
  • Self-regulation - monitoring our own reasoning, catching our own biases

The Delphi study identified habits of mind that distinguish people who use these skills consistently - inquisitiveness, systematicity, truth-seeking, open-mindedness, and judiciousness. We’ve all experienced this during code reviews: technical skill isn’t enough if someone becomes defensive or stops genuinely engaging with alternative perspectives; we refer to this as our disposition.

The work also established standards for good critical thinking: clarity, accuracy, precision, relevance, depth, breadth, logic, and significance. These are not arbitrary criteria - they’re the qualities that distinguish robust reasoning from reasoning that merely sounds convincing - and we’ll see how important this is as we explore how AI changes what critical thinking means in practice.

The Changing Nature of Thinking With AI

For decades, software development involved humans reasoning about problems and then expressing those solutions through code; critical thinking should be happening before and during the act of coding. AI hasn’t eliminated that, isn’t replacing human judgment, and isn’t removing the need for critical thinking. Instead, it’s changing where and how critical thinking is applied.

Consider what happens when we work with an AI assistant on a coding task. The AI can generate plausible code remarkably quickly. This creates a new cognitive demand: we need to evaluate code we didn’t write, based on a mental model we may not have fully formed yet. The traditional workflow of “think deeply, then implement” becomes “sketch intent, evaluate output, refine understanding.”

Peter Naur’s 1985 essay “Programming as Theory Building” 2 argued that programming is fundamentally about developers building and maintaining a theory of how the program relates to the problem domain. The program itself is just an expression of that theory. When AI generates code, we face a question: whose theory is being built? Or rather, is a theory being built?

The answer matters because theory - in Naur’s sense - is what enables us to modify programs appropriately, explain why things are the way they are, and respond to new requirements sensibly. Without theory, we’re left with code we don’t fully understand, making changes through trial and error.

This suggests that critical thinking in the AI era requires conscious attention to theory building in ways it perhaps didn’t before. When we wrote every line ourselves, theory building was almost unavoidable. When AI writes substantial portions, we need to deliberately construct understanding of what was generated and why it works - we have an increased focus on learning and evaluation.

There’s also a shift in what constitutes “good” critical thinking. Previously, questioning our own assumptions and biases was primarily about catching our own, or our colleague’s errors. Now it includes questioning whether the AI has understood our intent, whether it’s applied patterns appropriately for our context, and whether plausible-sounding explanations actually correspond to what the code does.

The redistribution of cognitive work makes it worth examining how critical thinking applies in each domain where AI is now active.

Critical Thinking in Software and Product Development

Software development has always demanded critical thinking, even if we didn’t always call it that. The field emerged from mathematics and engineering, disciplines where rigorous reasoning is foundational. But software occupies an unusual position: it’s both a formal system (governed by logic and mathematics) and a social artefact (built by humans, for humans, within organisations).

This dual nature means critical thinking in software development operates on multiple levels simultaneously.

At the code level

When we’re debugging, we’re interpreting symptoms, analysing causes, evaluating hypotheses, and inferring what change will fix the issue. The self-regulation component shows up when we catch ourselves assuming “it must be the database” because the database caused problems last time - and we force ourselves to actually gather evidence.

When reviewing code, we’re evaluating whether the implementation matches stated intent, whether edge cases are handled, whether the approach is appropriate for the context. The dispositional elements matter enormously here: is our review genuinely truth-seeking, or are we trying to demonstrate cleverness?

At the architectural level

Every architectural decision involves inference about future consequences - how will this choice constrain us later? Should this be synchronous or asynchronous? Monolith or microservices? What failure modes does it introduce? The quality of these inferences depends directly on the depth and breadth of our analysis.

Fred Brooks observed in “No Silver Bullet” 3 that there’s an essential complexity in software - complexity that derives from the problem domain itself, not from our tools. Critical thinking helps distinguish essential from accidental complexity. When we’re evaluating a proposed solution, are we assessing whether it addresses the actual problem, or whether it’s solving a simpler problem the team finds more tractable?

At the product level

Product development introduces radical uncertainty. We’re making decisions with incomplete information, under time pressure, with imperfect understanding of user needs. Critical thinking here includes meta-reasoning: “how confident should we be in this decision?”, “what would change our mind?”, “what’s the cheapest way to test our assumptions?”.

As we’re now reasoning about user needs, market dynamics, strategic positioning, the evaluation standards include not just “does this work?” but “does this matter?” and “is this the right thing to build?”

At the organisational level

Amy Edmondson’s research on psychological safety 4 reveals something important: critical thinking isn’t just an individual cognitive skill. Teams need environments where people can question assumptions, admit uncertainty, and challenge ideas without fear. A technically brilliant team that can’t question each other’s reasoning will make worse decisions than a less skilled team with strong critical thinking culture.

This matters because software is built by teams, not individuals. Our ability to think critically is bounded by our ability to think together critically. Code review, pair programming, design discussions are all collaborative critical thinking exercises.

AI Enters The Fray

Traditional software development assumed humans did the reasoning and computers did the executing. AI blurs this boundary. When an AI suggests an implementation, we’re evaluating not just correctness but appropriateness. When it generates tests, we’re assessing whether they actually validate what matters.

The critical thinking demand hasn’t decreased - if anything, it’s intensified. But the focus has shifted from “am I solving this correctly?” to “has this been solved correctly, and does it solve the right thing?”

Having examined how critical thinking operates in software development, we can now examine the practice of prompting an AI.

Critical Thinking About Prompting

Prompting has emerged as a new skill, and like any new skill it’s accumulating a body of practices, advice, and received wisdom. Some of this is evidence-based, much of it isn’t, and some of the wilder claims are being shared on social media to boost engagement statistics. Critical thinking about prompting means evaluating these practices with the same rigour we’d apply to any technical claim.

The Problem With Evidence

Prompt engineering is remarkably young. The techniques that work today weren’t possible three years ago because the models didn’t exist. This creates a methodological challenge: by the time we could rigorously study a technique, the models have often changed. There are releases of models every few months, each with different capabilities and behaviours.

Consider “chain-of-thought” prompting - asking the model to think step-by-step. Wei et al.’s 2022 research 5 showed significant improvements on reasoning tasks, providing solid evidence for the technique. Yet we’ll also encounter advice like “always use polite language” or “threaten the model with consequences” - claims that spread through social media without controlled evaluation.

Critical thinking here means asking: what’s the actual evidence? A viral tweet isn’t evidence. An anecdote isn’t evidence. Even a compelling demonstration with one example isn’t evidence - it’s an existence proof that something can work, not that it does work reliably.

Cargo cult prompting

You’ve likely seen prompts that contain elaborate rituals: specific phrasings, particular structures, magical incantations. Some of these have theoretical grounding. Others persist because someone successful uses them, so they must work - classic cargo cult thinking.

The Anthropic documentation on prompting offers guidance based on their understanding of how their models work. That’s different from a random Medium article about “10 ChatGPT tricks experts use to make $100k per week”. The former comes from people who built the system. The latter comes from someone who found something that may have only worked once.

Folk wisdom isn’t necessarily wrong - practitioners sometimes discover effective techniques before researchers can validate them - though we still need to evaluate the source, understand the claimed mechanism, and test whether it actually improves outcomes for our use case.

Understanding the mechanism

When someone claims a technique works, ask why it would work. If the explanation is “the AI likes it when we’re polite” - that’s anthropomorphisation, not mechanism. If the explanation is “providing examples helps the model identify the pattern we want” - that’s a testable claim about how pattern matching and few-shot learning operate.

The best prompt engineering advice explains why something works in terms of how language models actually function. This lets us reason about when to apply it and when it’s irrelevant.

Few-shot learning, example of examples

Research 6 clearly shows that providing examples (few-shot prompting) improves performance on many tasks. But critical thinking pushes further: how many examples? What makes a good example? Does order matter? Does diversity of examples matter more than quantity?

These are answerable questions, but they require moving beyond “use examples because someone said so” to “use examples because they help the model identify the relevant pattern, and I understand what pattern I’m trying to indicate.”

Evaluating claims of effectiveness

When we read that a technique “dramatically improves” results, critical thinking asks: improves according to what metric? On what tasks? Compared to what baseline? “This prompt is better” is meaningless without specification of better for what and better how.

You can’t apply critical thinking standards like precision and accuracy to vague claims. If someone says “this prompt structure gets better code,” we need to know: better by what measure? Fewer bugs? More maintainable? Faster? Closer to requirements?

Challenges with replication

Software engineering has a replication crisis of its own - Shepperd et al.’s 2014 analysis 7 found that many claimed improvements in software practices don’t replicate. Prompt engineering faces the same risk, amplified by rapid model changes.

When we find advice about prompting, consider: could we replicate this? Is the claim specific enough to test? Are the conditions clear enough that we’d know if it applies to our situation?

Summary

Critical thinking about prompting means treating it as an engineering discipline, not a mystical art. It means evaluating techniques based on evidence, understanding mechanisms not just recipes, and being willing to test rather than assume.

This meta-level critical thinking - about the practice itself - sets the foundation for the next question: how do we embed critical thinking within our prompts?

Critical Thinking In Prompting

If critical thinking about prompting is the meta-level question of which techniques work, critical thinking in prompting is the practice-level question: how do we structure prompts to elicit reasoning that meets critical thinking standards?

The distinction matters because AI can produce two very different kinds of output: answers that sound authoritative, and answers that result from careful reasoning. The difference often lies in how we prompt.

Requesting explicit reasoning

The simplest technique is to ask for it. “Explain your reasoning” or “think step-by-step” prompts the model to make its inference process visible. Explicit reasoning serves multiple purposes: it lets us evaluate the reasoning, it often improves the quality of the final answer, and it helps us identify where the reasoning might be flawed. Many models have a ‘thinking mode’ that can be activated to perform this kind of reasoning. The benefit extends beyond accuracy to auditability - we can see where the reasoning goes wrong.

Prompting for analysis

Rather than asking “what should we do?”, structure prompts to request analysis first: “What are the key factors in this decision? What are the trade-offs between approaches? What assumptions underlie each option?”

Such structured prompts mirror how we’d approach a problem ourselves - decompose it, examine components, identify relationships. When we prompt for analysis explicitly, we’re more likely to get output that’s genuinely analytical rather than pattern-matched from training data.

Requesting evaluation

You can prompt AI to evaluate claims, evidence, or even its own outputs. “What are the weaknesses in this argument?” or “What evidence would contradict this conclusion?” or “What are the failure modes of this approach?”

Evaluation prompts are particularly valuable in code review scenarios. Rather than asking AI to “review this code,” we might ask: “What edge cases might this code not handle? What assumptions does this implementation make? What would break if requirements change in these specific ways?”

Prompting for alternative perspectives

Critical thinking requires considering multiple viewpoints. You can explicitly request this: “What are three different ways to approach this problem?” or “What would a security engineer’s concerns be? What would a performance engineer prioritize?”

Requesting multiple perspectives is especially useful when we’re working in a domain where we have gaps in expertise. By prompting for different perspectives, we’re creating a synthetic version of diverse team input.

Structured reasoning frameworks

Rather than free-form responses, we can request specific reasoning structures. For instance:

Analyse this architectural decision using these criteria:
- Performance implications
- Maintainability concerns  
- Failure modes
- Scaling characteristics
- Team expertise requirements

Such frameworks ensure the analysis is comprehensive rather than focusing only on salient features.

Prompting for assumption identification

One of the most valuable applications: “What assumptions does this solution make?” or “What would need to be true for this approach to work?”

When designing systems, unexamined assumptions are often where things go wrong. By explicitly prompting for assumption identification, we’re creating a forcing function for critical examination.

Self-evaluation prompts

You can ask AI to evaluate its own outputs: “Review my previous response. What questions should I ask to verify this is correct? What could be wrong with this analysis?”

Self-evaluation is particularly useful in iterative problem-solving. After getting an initial response, asking for self-evaluation often surfaces issues or limitations that weren’t apparent in the first pass.

Requesting evidence and sources

When AI makes claims, prompt for the basis: “What evidence supports this recommendation? What research or established practices inform this approach?”

While this won’t always yield perfect citations, it shifts the response from assertion to argumentation - from “do this” to “here’s why this makes sense given these principles.”

Comparative analysis

Rather than asking for a single solution, request comparison: “Compare these three approaches across relevant dimensions. What are the strengths and weaknesses of each?”

Comparison forces evaluation rather than mere generation. It’s the difference between “give me a solution” and “help me reason about which solution fits my context.”

The limitations

It’s important to recognize what this doesn’t do. Prompting for critical thinking doesn’t guarantee the AI has actually engaged in reasoning - it may be pattern-matching what critical thinking looks like. The output may appear rigorous without being rigorous.

Critical thinking about prompting and critical thinking in prompting need to work together for precisely this reason. You prompt for structured reasoning, then we apply our own critical thinking to evaluate whether that reasoning is sound.

Practical application

In practice, embedding critical thinking in prompts means moving from:

Baseline Prompt Enhanced Critical Thinking Prompt Write a function to parse JSON Write a function to parse JSON.
What edge cases should it handle? What errors might occur?
How should it fail gracefully? Is this a good architecture? Evaluate this architecture against these criteria: [list].
What are the trade-offs?
What assumptions does it make? Fix this bug Analyse this bug. What’s the root cause?
What are three possible fixes?
What are the implications of each?

The prompts become longer, more structured, and more explicit about the reasoning process we’re requesting. But the output becomes more valuable because it’s not just an answer - it’s reasoning we can examine, question, and build upon.

What Remains Human

We started by defining critical thinking: purposeful, self-regulatory judgment involving interpretation, analysis, evaluation, and inference. We’ve examined how AI redistributes this cognitive work across software development and prompting practices. Now we can ask: what does this mean for human cognition when AI is present?

Theory-building

Naur argued that a programmer’s value lies not in the code they produce but in the theory they hold - their understanding of how the program relates to the problem domain. Theory enables appropriate modification, explains design choices, and guides responses to new requirements.

When AI generates code, this theory doesn’t automatically transfer. The code might work, but we don’t necessarily understand why it works, what alternatives were considered, or how it should change when requirements shift. Theory building becomes a more conscious, deliberate practice - we can’t rely on writing code to build theory, so we actively construct understanding of what was generated through critical evaluation.

Curation

Previously, the hard work was generating solutions - thinking through the problem, designing an approach, implementing it carefully. Evaluation was important but often easier than generation, so the cognitive load was front-loaded.

AI inverts this relationship: generation becomes fast while evaluation becomes the bottleneck, requiring us to judge not just correctness but appropriateness, maintainability, and contextual fit. we’re assessing whether the generated solution reflects good thinking about the problem, forming judgments about work we didn’t create, often rapidly and while our mental model is still forming. The critical thinking skills - interpretation, analysis, evaluation - operate more continuously than when we controlled the pace of generation.

Human Judgement

AI analyses patterns across millions of codebases and suggests implementations based on statistical regularities. But it doesn’t hold theory or understand our specific context - the organisational constraints, planned system evolution, or tacit team knowledge about what matters.

Human critical thinking remains essential precisely here: evaluating contextual fit, judging trade-offs against specific constraints, and determining whether solutions address actual problems rather than surface manifestations. The dispositional elements from the Delphi study - inquisitiveness, systematicity, judiciousness, truth-seeking - become more important as plausible-sounding AI suggestions demand the disposition to question, probe for weaknesses, and consider alternatives.

Collaboration

Psychological safety research found that critical thinking isn’t purely individual cognition - it’s collective. Teams need environments where members can question each other’s reasoning, including decisions about accepting AI-generated solutions. The quality of team reasoning may depend on whether people feel comfortable admitting “I don’t fully understand this AI-generated code” rather than feigning comprehension they lack.

What’s automated, what isn’t

AI automates pattern matching and statistical synthesis at scale - recognising “this looks like that” and generating outputs following familiar patterns, known from its training data. What it doesn’t automate is judgment about pattern appropriateness. It lacks goals, constraints, or understanding of consequences beyond immediate output. It doesn’t hold theory or care whether solutions fit our context.

These require human critical thinking: interpreting complex, ambiguous situations; evaluating against implicit or contextual criteria; and self-regulation - catching biases, questioning assumptions, recognising when we’re accepting something because it sounds good rather than because it is good.

The Development of Expertise

If expertise develops through practice, (the prevailing theory), and AI now handles much of what used to be practice, how will people develop expertise? This question echoes concerns that emerged with calculators and IDEs - tools that automated cognitive work. The answer has generally been that people learn different things at different levels of abstraction. Yet it reinforces that critical thinking becomes more important as AI handles routine implementation, leaving humans to excel at the non-routine: judgment calls, context-dependent decisions, and recognising when standard patterns don’t apply.

Conclusion

Working with AI doesn’t reduce the need for critical thinking, it redistributes where that thinking is needed and demands it with greater intensity and discipline. Less in generation, more in evaluation; less in implementation, more in judgment about appropriateness; less in writing code, more in building and maintaining theory about systems.

The standards remain constant: clarity, accuracy, precision, relevance, depth, breadth, logic, significance. The dispositional elements remain essential: inquisitiveness, systematicity, judiciousness, truth-seeking, open-mindedness. What changes is the context - we’re thinking critically not just about problems and solutions, but about AI outputs, prompting practices, and whether understanding is being built or merely code generated.

Perhaps the core insight is this: AI is powerful not because it replaces human thinking, but because it creates conditions where high-quality human thinking becomes more valuable. The question isn’t whether to think critically - it’s whether we’re deliberate about doing so in this new context.

  1. Facione, P. A. (1990). Critical Thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction. The Delphi Report. 

  2. Naur, P. (1985). Programming as Theory Building. Microprocessing and Microcomputers, 14(11), 365-371. 

  3. Brooks, F. P. (1987). No Silver Bullet: Essence and Accidents of Software Engineering. Computer, 20(4), 10-19. 

  4. Edmondson, A. C. (1999). Psychological Safety and Learning Behavior in Work Teams. Administrative Science Quarterly, 44(2), 350-383. 

  5. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models.  

  6. Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical Networks for Few-shot Learning. 

  7. Shepperd, M., Kitchenham, B., Budgen, D., Brereton, P., & Mair, C. (2014). A systematic review of evidence for software engineering practice. Information and Software Technology, 56(11), 1230-1246. 

Critical Thinking in the Age of AI was originally published by Capgemini at Capgemini Software Engineering on November 17, 2025.

https://capgemini.github.io/engineering/critical-thinking-age-of-ai
How To Ask For Help
DevelopmentCulture
Show full content

So you have a difficult problem to solve, and you’ve decided to ask for help with it. Congratulations! Recognising you’ve got a problem is the first step to solving it. Nobody knows how to solve every problem, so you shouldn’t feel bad about needing to ask for help. The next step, which a lot of people find difficult, is to do the asking in a way that helps people to help you. This article aims to help you with that.

A lot of what I want to say here is already covered in Eric Raymond’s classic essay How To Ask Questions The Smart Way, and I’d recommend reading it. Having said that, it was written in 2001, with open source hackers in mind, and isn’t the easiest read (not least because of the aggressive tone, which hints at some of the troubling aspects of the author’s worldview), so I want to write a newer, kinder version, with some of my own ideas added.

What’s your problem?

Before anyone can solve a problem, they need to know what the problem is. If you want people to help you, help them by providing a clear and concise explanation of what it is that you need help with.

Being able to explain things clearly is one of the most important skills a developer needs in my view, both when dealing with clients, and when helping teammates to learn. When I interview developers, one of the things I want to know is whether they can explain concepts clearly and without assuming prior knowledge.

What have you done for me lately?

Most people generally want to be helpful, but context switching is a big problem for anyone who needs to solve complex problems, so when somebody is trying to help you, they’re already giving you a big piece of their attention and energy - don’t make it harder than it needs to be to help you.

Hopefully you haven’t just given up at the first hurdle and asked for help before at least trying a few different things, searching, or asking your AI tool of choice. Before other people can help you, you should do some work for yourself. If all you’re saying is “please help me with this”, you may as well be saying “please do my job for me”.

Before you ask the question

I can’t say this any better than Randall Munroe of XKCD, so I’ll share his advice here:

flowchart of how to solve computer problems

This article is basically a more advanced version of that flowchart for software engineers.

Why is it important to ask in the right way?

When I was at university I worked on the student union stage crew, doing sound and light for gigs and discos. We were volunteers, but as I’ve written elsewhere, being amateurs didn’t diminish our levels of professionalism. We were trained and supported by a full time technician.

At the end of the night, if any of the equipment wasn’t working, we had to report it to the technician, so that he could work on it the next day. One of his pet peeves was if someone reported a problem, but didn’t give him any more details beyond saying “it’s not working”. He was a busy man, and the more information we could give him, the less time he would have to spend figuring out what the problem was, and the more time he could spend on fixing it. The people reporting the problem already knew what the problem was, so for us to spend a few moments explaining it to him would save him from having to diagnose it.

How is it not working? For example with a disco light, does it completely fail to turn on, or does it flicker intermittently? Is it dimmer than it should be?

He was happy to help, but like Eric Raymond and his fellow hackers, if we were wasting his time, he would call us out for it publicly, and his criticism would sting.

The lessons I learned there have stayed with me for a long time, and some of them are very relevant to software development. In particular, there is a lot of value in clear and effective bug reports, which can save developers a lot of time. When a developer needs to ask for help, you could think of it as a bug report on work in progress, and a thoughtful question will save time for the person helping.

Where should you ask?

Your first instinct might be to ask your team lead. After all, they’re more experienced, they’re familiar with the code, and it’s part of their job to help you. A quick message to them and they’ll surely solve your problem in no time. The trouble is that they’re busy, and they’re not the only person on the team who can help.

You might feel shy about asking for help in your team’s chat channels, but please don’t ask for help in private messages. It’s almost always better to ask in the channel - that way you increase the chances of somebody answering, and the rest of the team will benefit from the answer.

If it’s a problem with open source code, ask in the relevant support forums for that project. For more general queries, Stack Overflow and its siblings might be more appropriate.

Wherever you ask, it’s always best to search before asking - don’t waste people’s time by asking a question that has already been answered.

R(&W)TFM

You have read the documentation, haven’t you? If not, then please do. The people who seem to know more than you aren’t magical gurus with every answer - they’ve just spent time learning things. If you’ve read the documentation, and you don’t understand how to do something basic on your project, it’s probably a sign that the documentation needs to be improved - once you’ve found an answer, apply the boy scout rule to your documentation and share what you’ve learned.

A picture isn’t always worth a thousand words

If you have an error message, don’t screenshot it - copy and paste the text. That makes it a lot easier to search for, either online or in the source code.

Similarly, a picture of some code isn’t an easy way to read and understand it, and nor is chat software like Teams and Slack. If you’ve got some partly working code, or even just an idea, and you need a few pointers, please don’t dump the code into a chat. Create a pull request with your work in progress, and make it clear that it isn’t ready to be merged - most code repositories have a way to mark a pull request as draft or create tasks that will prevent it from being merged.

What are you trying to achieve?

It’s useful to know the context of your question. Sometimes it might not be the right question to ask - there may be a better way to approach the problem. The context can help other people to understand the actual problem you’re trying to solve, not just the problem that you’ve got stuck on. If you’re halfway down the road, you might need to think about whether it’s the right path - there may be a complete different approach that you haven’t thought of. Perhaps you’re finding things difficult because you’re not working with the grain - maybe the problem can be avoided altogether.

Isolate the problem

When trying to solve a problem, it’s very easy to get distracted by other things. Is it possible to create a minimal test case that demonstrates the problem? For example, if it’s a front end problem, can you create a CodePen that has only the elements you need to demonstrate it?

This is also a good idea for debugging generally: Be scientific. Experiment with what might fix the problem by changing one thing at a time and seeing what happens. Take notes about the experiment so that you know what made a difference, and which variables were important.

Help yourself

If you take the time to formulate a good question, often you find that you’re able to answer the question yourself. This is the basis of rubber duck debugging - the act of preparing the question helps you to understand the problem more clearly, and that can often help you to solve it without needing someone else to help. I’ve lost count of the number of times I’ve got part way through asking for help and answered the question myself, leaving the other person with nothing to do beyond accepting my thanks for their time.

If you’ve taken the time to do all that, and you haven’t solved the problem, then by all means ask for help, but remember that other people don’t owe you their time. Whether they’re strangers on the internet or colleagues on your team, be respectful of the time that they are giving you, and help them to help you by asking good questions. If you ask a good question, you’re much more likely to get a good answer.

I hope that this is useful, but if it isn’t clear, please ask for help (in the right way).

How To Ask For Help was originally published by Capgemini at Capgemini Software Engineering on July 04, 2025.

https://capgemini.github.io/development/how-to-ask-for-help
The Standard Model: Fundamental Forces of Software Engineering
DevelopmentEngineeringLearningMethodologyTheory
Show full content

Just as the universe is governed by a handful of fundamental forces, (gravity, electromagnetism, and the strong and weak nuclear forces), software engineering is too shaped by irreducible forces. These forces, drawn from the deep structure of our discipline, define what is possible, what is difficult, and why certain patterns of success and failure recur.

Viewing software engineering through this lens allows us to see not just a collection of best practices, but a coherent framework grounded in the very nature of software itself.

Synthesis: A Standard Model of Software Engineering

These three forces can be viewed as an interdependent foundation for software engineering:

  • Organised Complexity creates the challenge.
  • Cognitive Theory makes human understanding the central resource and bottleneck.
  • Mathematical Limits define the boundaries within which all software must operate.

All established principles and recurring failure patterns in software engineering can be traced to the interplay of these forces. Just as the Standard Model in physics explains the behaviour of matter and energy, this “Standard Model” of software engineering explains why certain practices succeed and why the same mistakes recur across generations.

1: The Force of Organised Complexity Summary

Like gravity, which acts everywhere and shapes the structure of the universe, the force of organised complexity is ever-present in software. It binds components together and makes their interactions non-trivial. This force explains why software systems are more than the sum of their parts, why small changes can have unpredictable effects, and why complexity can never be fully eliminated, only managed. Just as gravity cannot be “turned off,” complexity is an inescapable property of software systems.

Key Insights:

  • Complexity is fundamental, not a sign of poor design.
  • Small changes can ripple unpredictably.
  • Abstraction and modularity help but cannot eliminate complexity.

Imagine trying to explain how a city works by describing the properties of every building, road, and person individually - you’d miss the crucial patterns of how the actions of its inhabitants wear paths, operate machines and vehicles, connecting and influencing each other.

Software is similar: it’s not just a collection of instructions that computers follow, but an intricate web of interconnected parts in which a change to one small thing can ripple through the entire system in unexpected ways. Unlike a physical machine where you can isolate and fix a broken gear, software problems often emerge from the complex interactions between parts that each work perfectly on their own.

When software seems to break mysteriously, or when adding a “simple” feature takes months, or when fixing one bug creates three new ones, it’s not incompetence. It’s because software engineers are managing thousands or millions of precise interactions that all must work together perfectly, and human minds simply aren’t equipped to hold all these connections in our heads at once.

Theoretical Foundation

Software represents a unique form of organised complexity that cannot be decomposed into simpler elements without losing essential properties. This principle derives from several theoretical foundations:

Systems Theory: Herbert Simon’s architecture of complexity demonstrates that hierarchical systems exhibit emergent properties that are not present in their components. Software systems exemplify this principle - the behaviour of a complete application cannot be predicted solely from understanding individual functions or modules.

Complexity Science: Warren Weaver’s distinction between disorganised complexity (amenable to statistical methods) and organised complexity (requiring new analytical approaches) places software firmly in the latter category. Unlike physical systems governed by statistical mechanics, software exhibits precise, deterministic complexity where every detail matters.

Essential vs Accidental Complexity: Fred Brooks identified that software possesses irreducible essential complexity inherent to the problem domain. No methodology, tool, or abstraction can eliminate this fundamental intricacy - it can only be managed, not simplified away.

Implications

This Fundamental Force establishes that:

  • Abstraction layers inevitably leak, as Joel Spolsky observed, because complexity cannot be fully hidden
  • Modularity provides boundaries but cannot eliminate interdependencies
  • Testing cannot achieve complete coverage due to combinatorial explosion
  • Documentation can never fully capture system behaviour
2: The Force of Cognitive Theory Summary

Comparable to electromagnetism, which governs interactions at both the atomic and macroscopic scale, the cognitive force in software is about the transfer and maintenance of understanding. It’s the “field” that connects human minds to code and to each other. This force explains why code is not the software itself, but a representation of shared mental models. It governs knowledge transfer, team dynamics, and the persistence (or loss) of understanding over time.

Key Insights:
  • Software exists as a theory in human minds and teams.
  • Code is an imperfect externalisation of this theory.
  • Knowledge transfer is about reconstructing mental models, not just reading documentation.
Explanation

When you look at code, you might think you’re seeing the software itself - but you’re looking at something more like sheet music. Just as sheet music isn’t the music but rather instructions for creating music, code is really instructions that reflect a programmer’s understanding of how to solve a problem, and those instructions are translated through various processes into more detailed sequences of instructions that execute on the machine. The most real representation of the software lives in the development team’s mental model of what the system is supposed to, what it actually does, why it does it, and how all the pieces fit together. We try to represent our collective mental model using software architecture, references to design patterns, various types of diagrams and the typed code – which is just an attempt to capture this understanding in a form that computers can execute.

This is why a team can inherit “perfectly good code” from another team and still struggle to work with it - they’ve received the sheet music but not the understanding of how to perform it. It’s why programmers spend so much time in meetings drawing diagrams and discussing approaches rather than typing and why losing key team members can cripple a project even when all the code remains. The challenge isn’t writing code; it’s building and sharing the mental theory of what the software should do and how it should work. When developers say they need to “understand the codebase,” they’re not memorising the code - they’re reconstructing the original thinking behind it.

Theoretical Foundation

Software exists primarily as a theory in human cognitive systems - both individual minds and the collective understanding distributed across teams - with code serving as an imperfect externalisation of this knowledge.

This perspective draws from multiple theoretical traditions:

Cognitive Science: Peter Naur’s 1985 paper “Programming as Theory Building” establishes that programming consists of building and maintaining a theory about how computational processes solve real-world problems. The program text merely reflects this theory but cannot fully contain it. The theory (of a piece of software) often exists not in a single mind but distributed across team members who each hold partial understanding.

Philosophical Foundations: Michael Polanyi’s concept of tacit knowledge explains why software understanding transcends documentation. Much of what programmers know about their systems exists as personal, embodied knowledge that cannot be fully articulated. This tacit knowledge accumulates not just individually but culturally within teams.

Distributed Cognition: Edwin Hutchins framework shows how cognitive processes extend across multiple agents and artefacts. In software teams, understanding emerges from the interaction between individual mental models, shared representations, and external tools. Recent research on cumulative culture in problem-solving suggests that software solutions emerge through collective learning across time, with each generation of developers building upon previous understanding.

Social Learning Theory: Software development exhibits characteristics of cumulative culture, (per Cat Hicks et al), where knowledge accumulates through social transmission and collaborative refinement. Teams naturally develop shared languages, patterns, and practices that embody collective understanding beyond what any individual member possesses.

Implications

This Fundamental Force reveals that:

  • Code without its accompanying theory becomes unmaintainable “legacy” systems
  • Successful software development requires shared mental models among team members
  • Knowledge transfer involves theory reconstruction, not just information transmission
  • Teams naturally evolve collective understanding through documented patterns, shared practices, and cultural transmission
  • The most successful teams develop mechanisms for preserving and transmitting understanding across time
  • AI participation in development introduces new forms of cognitive partnership where understanding emerges from human-AI dialogue
3: The Force of Mathematical Limits Summary

Like the strong and weak nuclear forces, which set hard boundaries on what is possible in the physical world, mathematical limits in software define what can and cannot be achieved, regardless of skill or effort. This force explains why some problems are undecidable, why perfect verification is impossible, and why trade-offs and heuristics are necessary. These are not limitations of current technology, but fundamental boundaries proven by mathematics.

Key Insights:
  • Some problems are mathematically undecidable.
  • Perfect verification is impossible for non-trivial systems.
  • Heuristics and approximations are not compromises, but necessities.

Many people assume that with enough time, effort, and skill, programmers can make software do anything and fix any problem. But software faces hard mathematical limits - walls that cannot be climbed no matter how clever we are. It’s like asking someone to draw a square circle or find the highest number; these aren’t difficult tasks, they’re impossible ones. Computer science has mathematically proven that certain things software simply cannot do, such as perfectly predicting whether any given program will crash or run forever or automatically finding the absolute best solution to many scheduling and optimisation problems.

These aren’t limitations of current technology that future advances will overcome - they’re as fundamental as the laws of physics. This is why your computer sometimes freezes trying to solve seemingly simple problems, why software can’t automatically find and fix all its own bugs, and why programmers often talk about “good enough” solutions rather than perfect ones. When software companies say they can’t guarantee their products are completely bug-free or perfectly secure, they’re not making excuses - they’re acknowledging mathematical reality. Understanding these limits helps explain why software development remains difficult and expensive despite decades of technological advancement.

Theoretical Foundation

Software engineering confronts absolute boundaries defined by mathematical logic and computation theory. These limits shape what is possible and guide how principles must be formulated:

Computability Theory: Turing’s halting problem and related undecidability results prove that fundamental questions about program behaviour cannot be algorithmically determined. No amount of engineering sophistication can overcome these limits.

Gödel’s Incompleteness Theorems: These establish that no formal system can be both complete and consistent. Software systems, as formal structures, inherit these limitations - there will always be true properties that cannot be formally proven.

Information Theory: Shannon’s theorems define minimum complexity for representing information. Data compression, error correction, and communication all face theoretical boundaries that constrain software design.

Computational Complexity: The P vs NP problem and complexity classes define which problems can be efficiently solved. Many critical software engineering tasks (optimisation, verification, testing) face exponential complexity barriers.

Implications

This Fundamental Force necessitates that:

  • Perfect verification remains impossible for non-trivial systems
  • Optimisation must target “good enough” rather than optimal solutions
  • Heuristics and approximations become essential tools rather than compromises
  • Human judgement remains irreplaceable for navigating undecidable territories
Synthesis: The Foundational Trinity

These three Fundamental Forces form an interdependent foundation:

Organised Complexity creates the challenge that cannot be wished away through better tools or methods. It represents the irreducible difficulty of mapping computational processes to real-world problems. Teams cope with this complexity not through individual brilliance but through accumulated collective knowledge - patterns, libraries, and practices that evolve over time.

Cognitive Artefact nature makes human understanding the central challenge and resource. Software quality depends on the clarity and shareability of mental models, which exist not just in individual minds but distributed across teams and time. Mature teams develop cultural mechanisms - mentoring, documentation practices, code reviews - that preserve and transmit this understanding.

Mathematical Limits define the boundary conditions within which all software engineering must operate. They transform the discipline from seeking perfect solutions to navigating trade-offs within fundamental constraints. Collective exploration of the possibility space often yields better approximations than individual attempts.

Together, these Fundamental Forces explain why software engineering differs qualitatively from other engineering disciplines. Physical engineering works with materials governed by continuous mathematics and statistical properties. Software engineering manipulates discrete, exact structures where small changes can have unbounded effects, where understanding exists primarily in minds rather than blueprints, and where mathematical impossibility theorems directly constrain practice.

Established Principles Explained Through the Fundamental Forces

Well-known software engineering principles emerge as logical consequences of these foundational realities:

DRY (Don’t Repeat Yourself)

Every piece of knowledge should have a single, unambiguous representation in a system. This emerges from Fundamental Forces 1 and 2: duplication multiplies the complexity of maintaining mental models - when the same concept exists in multiple places, developers must synchronise multiple theories, violating cognitive limits.

YAGNI (You Aren’t Gonna Need It)

Don’t add functionality until needed. This follows from Fundamental Forces 1 and 3: every addition increases complexity exponentially due to interactions. Given the inability to perfectly predict future needs, premature abstraction adds complexity without proven value.

Conway’s Law

Organisations design systems mirroring their communication structures. A direct consequence of Fundamental Force 2: since software is fundamentally theory in human cognitive systems, and theories are built through communication, communication channel structure inevitably shapes the shared theory - and thus the software.

Single Responsibility Principle

A class/module should have only one reason to change. This arises from Fundamental Forces 1 and 2: multiple responsibilities create cognitive overhead - humans must maintain multiple, potentially conflicting theories about what a component does. Limiting responsibility limits mental model complexity.

Fail Fast

Systems should detect and report failures immediately. This emerges from Fundamental Force 3: since correctness cannot be proven for complex systems, failures must be assumed. Early detection minimises error propagation, keeping debugging tractable despite analytical limitations.

Separation of Concerns

Different program aspects should be separated into distinct sections. This follows from all three Fundamental Forces: reduces interaction complexity (Fundamental Force 1), allows different theories to be held separately (Fundamental Force 2), and enables partial verification within mathematical limits (Fundamental Force 3).

Open/Closed Principle

Software entities should be open for extension but closed for modification. This arises from Fundamental Force 2: modifying existing code requires reconstructing the original theory, which may be incomplete or lost. Extension allows new theories to be built on stable foundations without disturbing existing understanding.

Liskov Substitution Principle

Superclass objects should be replaceable with subclass objects without breaking the system. This emerges from Fundamental Forces 2 and 3: it maintains theoretical consistency - developers can reason about base classes without knowing all implementations, a partial knowledge strategy working within the limits of complete system analysis.

Principle of Least Astonishment

Software should behave as users and developers expect. A direct result of Fundamental Force 2: surprising behaviour forces theory reconstruction. Predictable behaviour allows existing theories to remain valid, reducing cognitive load.

Loose Coupling and High Cohesion

Components should minimise interdependencies while grouping related functionality. Loose coupling (from Fundamental Force 1) limits the combinatorial explosion of interactions. High cohesion (from Fundamental Force 2) ensures modules correspond to coherent theories - a single mental model explains the module’s behaviour.

Continuous Integration/Deployment

Integrate code frequently and deploy regularly in small increments. This arises from all three Fundamental Forces: small changes limit complexity growth (Fundamental Force 1), frequent integration keeps team theories synchronised (Fundamental Force 2), and small increments make debugging tractable within analytical limits (Fundamental Force 3).

Test-Driven Development

Write tests before implementation code. This follows from Fundamental Forces 2 and 3: tests externalise the theory of what code should do. Given the inability to prove correctness, tests provide empirical evidence that theory matches implementation.

Information Hiding/Encapsulation

Modules should reveal minimal internal details. This emerges from Fundamental Forces 1 and 2: hiding implementation reduces the complexity others must understand and allows different theories to coexist - users need only understand interface theory, not implementation theory.

Premature Optimisation as Anti-Pattern

Don’t optimise before identifying actual performance problems. This follows from Fundamental Forces 3 and 1: performance characteristics cannot be predicted analytically, and optimisation adds complexity. Adding complexity without empirical evidence of need violates both constraints.

These principles aren’t arbitrary “best practices” but logical consequences of software’s fundamental nature. Understanding their theoretical basis explains why violating them causes problems and why they’re rediscovered independently across teams and cultures. Principles focusing on Fundamental Force 1 manage complexity through decomposition and isolation. Those emphasising Fundamental Force 2 ensure cognitive manageability and theory preservation. Fundamental Force 3 principles acknowledge analytical limitations and promote empirical approaches. The most sophisticated principles navigate all three constraints simultaneously.

Common Paths to Failure

Just as established principles emerge from respecting the three Fundamental Forces, common failure patterns arise from violating or ignoring these foundational constraints. These recurring disasters in software engineering are not random misfortunes but predictable consequences of working against the fundamental nature of software.

The Hero Programmer Dependency

Relying on a single brilliant developer who holds all system knowledge in their head. This violates Fundamental Force 2 catastrophically: theory exists in one mind instead of being distributed across the team. When the hero leaves, the theory dies with them, leaving behind incomprehensible code. The organisation discovers too late that it possessed code but not software - the cognitive artefact departed with the individual.

The Big Rewrite

The decision to discard existing code and start fresh, assuming the current system’s problems stem from poor implementation rather than essential complexity. This violates Fundamental Force 2 fundamentally: it mistakes code for software, throwing away years of accumulated theory and hard-won understanding. The rewrite team must rediscover all the edge cases, business rules, and subtle interactions the original code embodied - often taking longer than the original development and sometimes failing entirely, as Netscape 6 famously demonstrated.

Second System Syndrome

The tendency for a successful simple system to be replaced by an over-engineered, feature-laden version that collapses under its own weight. This violates Fundamental Force 1: success with a simple system breeds false confidence about managing organised complexity. Designers assume that if they could handle X complexity, surely 3X is manageable with better architecture. Brooks identified this pattern decades ago, yet it recurs because each generation must learn that complexity grows exponentially, not linearly.

Integration Hell

Delaying component integration until late in development, only to discover incompatible assumptions and irreconcilable architectures. This violates all three Fundamental Forces simultaneously: complexity compounds invisibly (Fundamental Force 1), team theories diverge without synchronisation (Fundamental Force 2), and problems become analytically intractable - debugging requires understanding all components simultaneously (Fundamental Force 3). The adoption of continuous integration practices directly addresses this failure pattern.

The Documentation Myth

Believing that comprehensive documentation can substitute for developer understanding, leading to extensively documented but unmaintainable systems. This represents a pure Fundamental Force 2 violation: it confuses the externalisation (documentation) with the theory itself. Documentation can support theory transmission but cannot replace it. Teams inherit binders of specifications yet cannot modify the system safely because documentation captures what the code does, not why it does it or how the pieces form a coherent whole.

Analysis Paralysis

Attempting to design the perfect system upfront through exhaustive analysis, never actually building anything. This violates Fundamental Force 3 fundamentally: it denies mathematical limits on prediction and analysis. The belief that sufficient thought can anticipate all requirements and interactions ignores undecidability results and the impossibility of perfect foresight. Months or years pass in design while requirements shift, opportunities vanish, and nothing gets built. The quest for perfection prevents the empirical learning that actual implementation provides.

Summary

These failure patterns persist because they stem from intuitive but incorrect assumptions about software’s nature. They represent attempts to treat software as simpler than it is (violating Fundamental Force 1), as existing in artefacts rather than minds (violating Fundamental Force 2), or as analytically tractable when it isn’t (violating Fundamental Force 3). Understanding these patterns through the lens of the three Fundamental Forces explains not just why projects fail, but why the same failures recur despite being well-documented - each generation must learn to respect the fundamental constraints that define software engineering.

Practical Consequences

From these theoretical foundations flow practical principles:

  1. Embrace complexity rather than deny it - develop tools and practices suited to organised complexity, recognising that solutions emerge from collective effort over time
  2. Invest in theory-building through documentation, dialogue, knowledge-sharing practices, and cultural transmission mechanisms
  3. Accept imperfection as mathematical necessity, focusing on resilience over correctness
  4. Design for comprehension since human understanding remains the bottleneck, both for individuals and teams
  5. Respect fundamental limits by choosing achievable goals over impossible ideals
  6. Cultivate collective intelligence through practices that accumulate and transmit knowledge across team members and generations

These Fundamental Forces suggest that future methodologies - particularly those involving AI collaboration - must address all three dimensions: managing organised complexity through human-AI cognitive partnership while respecting mathematical boundaries.

The emerging dialogue between human and artificial intelligence represents not just a new tool but a fundamental evolution in how software theories are constructed, maintained, and evolved.

The most successful development teams are already those that intentionally adopt, or organically discover, practices aligned with these foundational realities: they build pattern libraries, establish mentoring relationships, create living documentation, and develop rituals for knowledge sharing - all mechanisms that acknowledge software development as a fundamentally collective cognitive endeavour operating within mathematical constraints.

The most successful development teams in future will be those who recognise the new form of these realities and apply them to develop and maintain software using AI.

Conclusion

Just as the Standard Model in physics provides a coherent framework for understanding the fundamental forces shaping the universe, this conceptual Standard Model of Software Engineering reveals the deep structure underlying our discipline. By recognising and respecting the three Fundamental Forces of organised complexity, cognitive theory, and mathematical limits, we can better understand why software engineering is inherently challenging, why certain principles succeed, and why certain failure patterns recur.

This framework not only explains established practices but can also guide future innovations, particularly as we integrate AI into our development processes. Embracing this Standard Model allows us to navigate the complexities of software engineering with greater clarity and effectiveness, ultimately leading to more reliable, maintainable, and successful software systems.

The Standard Model: Fundamental Forces of Software Engineering was originally published by Capgemini at Capgemini Software Engineering on June 11, 2025.

https://capgemini.github.io/engineering/standard-model-forces-of-software-engineering
The Hard Truths of Modernising Legacy Applications: Costs, Complexities, and Cloud-Native Benefits
Legacy ApplicationsModernisationCloud Native
Show full content

In today’s rapidly evolving technology landscape, enterprises are under growing pressure to innovate, stay competitive, and reduce operational overhead. At the heart of these demands lies the daunting challenge of modernising legacy applications. From high costs and vendor lock-ins to fragile architectures and limited test coverage, modernisation is a multi-faceted journey fraught with complexity. Before delving further, let’s take a moment to define what we mean by a legacy application.

What Is a Legacy Application?

According to Google, a legacy application is “a software system that is outdated but still in use, often built with older technologies and potentially lacking the support and maintenance that newer systems receive.”

These applications often remain critical to business operations, despite their aging architecture and limitations. Understanding their nature is key to grasping the challenges involved in modernising them.

This article explores the real-world challenges engineers and businesses face and why, in some cases, a full rewrite may be the smartest path forward.

1. Regression Risk: The Silent Killer of Confidence

Modernising code is risky. One of the biggest unknowns is whether the migrated application behaves as expected. Legacy applications, especially those built over decades, often have low or non-existent test coverage or rely heavily on manual testing. When modernising such systems, ensuring no regression is introduced becomes a serious challenge.

The lack of comprehensive, robust automated tests and absence of baseline performance benchmarks makes it hard to:

  • Validate critical business logic post-migration
  • Ensure that changes in infrastructure or environment don’t break core functionality
  • Refactor or modernize code with confidence

Without a solid safety net of tests and regression coverage, teams find it difficult to ensure that the refactored or modernised version doesn’t introduce regressions or functional discrepancies. For instance, even minor changes — like moving to a new deployment model or database — can cause silent failures that may surface in production. This slows down delivery and erodes confidence in the new system.

What You Can Do:
  • Invest early in test automation before or during migration
  • Design and implement benchmarking processes before migration
  • Use shadow deployments or canary releases to mitigate risk
  • Prioritise end-to-end tests for critical workflows
2. The Knowledge Gap: Bridging Generations of Tech

A major and often overlooked challenge is legacy knowledge loss. Many legacy systems are written in outdated languages like Perl, COBOL, earlier versions of Java, and usually rely on obsolete tools like SVN, ClearCase, WebSphere, etc. The original developers are often no longer around, and documentation is often minimal, outdated, or missing. This results in:

  • High onboarding time for new engineers
  • Increased risk when making changes
  • Dependence on a shrinking pool of specialists

Bridging this gap requires both upskilling and often reverse engineering, and without robust clear documentation or source control history, teams may spend months just understanding what the legacy system actually does before they can begin migration which slows down projects and adds risk.

3. Legacy Databases Dilemma: Old, Expensive, and Rigid

One of the thorniest parts of legacy modernisation is dealing with enterprise databases like Oracle, DB2, MS SQL, or Sybase. These systems not only lock businesses into costly licenses but also limit agility. These databases were once the gold standard, but they now act as heavy anchors, because:

  • Their licensing fees are prohibitive
  • Integration options are limited
  • Migration paths are complex
  • Support and upgrade cycles are expensive
  • Running on virtualised / cloud environment is restrictive

Moving to modern alternatives like Amazon Aurora or open-source RDBMS like PostgreSQL, MySQL or MariaDB can yield significant cost savings by up to 50% and improve performance, automation compatibility, agility, and scalability. Tools such as AWS Database Migration Service and Google Database Migration Service simplify the transition, making legacy-to-cloud migrations more feasible than ever.

4. The Security Paradox: Legacy Closed Environments vs. Cloud Realities

Legacy systems often run in highly restricted internal networks with tightly controlled environments often with bespoke security models. While this may seem secure, it creates friction. I recently worked on a modernisation project involving code refactoring, where development was restricted to a virtual machine environment, requiring use of legacy tools like ClearCase which is only accessibly within the VM. Furthermore, permission to install essential development tools within the VM was limited, making the workflow highly restrictive. Some of the other challenges include:

  • Manual certificate handling often posed with incompatible Transport Layer Security (TLS) Versions, No centralised PKI,
  • Limited TLS support, recurrent issues with hardcoded and expired certificates
  • Difficulty in implementing modern authentication / authorisation mechanisms (e.g., OAuth2, OpenID Connect)
  • Incompatibility with modern DevSecOps practices

Modernising entails adjusting to new security paradigms:

  • Zero trust architecture
  • Using ACME-based automation to automate certificate renewal
  • Centralising secret and certificate management using tools like HashiCorp Vault, Azure Key Vault, or AWS Secrets Manager
  • Offloading TLS to modern proxies like NGINX, Envoy, or AWS ALB
  • IAM roles instead of static users / passwords
  • Exposure to internet-based risks, requiring strong encryption and logging

Convincing security and compliance teams to shift from their tried and trusted on-prem policies can be an uphill battle. Cloud-native platforms provide built-in security features, compliance controls, and seamless integration with identity providers, reducing the security management overhead while improving visibility and threat detection.

5. The Cost Factor: Breaking Free from Licensing Lock-ins

Application built on proprietary software (think Oracle databases, WebLogic servers, or legacy middleware) can carry cost overheads. One of the most immediate and painful revelations during legacy modernisation is the inflated cost of licensing. Oracle databases, for instance, are licensed based on cores or users and can easily run into six-figure bills annually. By contrast, modern alternatives like PostgreSQL, or managed services like AWS RDS offer:

  • Pay-as-you-go pricing
  • No upfront licensing fees
  • Auto-scaling capabilities

Moving away from legacy enterprise databases like Oracle where possible to alternatives can offer up to 50% cost savings, if not more, with better scalability and native cloud integration 1, 2, 3.

6. Refactor vs Rewrite: When It’s Better to Start Fresh

In many cases, trying to incrementally modernise a monolith is like putting a new engine in a rusted car. While refactoring and replatforming can reduce migration risk, there are cases where it’s more efficient to rewrite the application as cloud-native from scratch. Benefits include:

  • Clean, modular architecture using microservices or serverless
  • Modern CI/CD and DevOps integration
  • Easily testable
  • Cost-effective and scalable cloud infrastructure
  • Modern dev stack (easier to hire talents for)
  • Easier onboarding and documentation

Rewriting the application allows organisations to align the system with current business processes, rather than bending processes to fit legacy constraints. Yes, rewrites come with their own risks — scope creep, delayed delivery, knowledge gaps — but for systems with outdated tech stacks, poor code quality, and minimal test coverage, it may be the faster path to long-term agility.

Final Thoughts

Modernising legacy applications is not just about code. It’s about cost, people, process, and risk. From overpriced licenses and fragile test coverage to security hurdles, knowledge gaps and outdated databases, the challenges are real and non-trivial. But with a strategic mindset, a solid roadmap, and the courage to consider a full rewrite when necessary, organisations can unlock agility, cost savings, and long-term sustainability.

If you’re planning a cloud migration or stuck in the middle of one, take a step back and evaluate: Are you just moving old problems to a new place, or solving them for good? The journey isn’t easy. But staying stuck in the past is far riskier than taking the bold step forward.

  1. DBX Labs $46 Million Cost Savings 

  2. Open Source is the New Data Center Standard 

  3. PostgreSQL vs. Oracle: Difference in Costs… 

The Hard Truths of Modernising Legacy Applications: Costs, Complexities, and Cloud-Native Benefits was originally published by Capgemini at Capgemini Software Engineering on June 06, 2025.

https://capgemini.github.io/development/modernising-legacy-applications
Prompt Driven Development
DevelopmentEngineeringAI
Show full content

Vibe coding (also vibecoding) is an AI-dependent programming technique where a person describes a problem in a few sentences as a prompt to a large language model (LLM) tuned for coding. The LLM generates software, shifting the programmer’s role from manual coding to guiding, testing, and refining the AI-generated source code. Vibe coding is claimed by its advocates to allow even amateur programmers to produce software without the extensive training and skills required for software engineering. The term was introduced by Andrej Karpathy in February 2025 and listed in the Merriam-Webster Dictionary the following month as a “slang & trending” noun.

Wikipedia

To prepare myself for the marathon I had set myself up for of building a Next.js app in one week, I watched a few short-form videos and read some online blogs. I loaded up a whole bunch of YouTube videos – I am visual, so this works for me. I also follow this topic doomscrolling on TikTok and watching comments fly by on my phone’s Discord notifications, the occasional Substack binge and me-curated X feeds. Side note: it’s dangerous letting social media just feed you stuff. Your tin hat will shine like a lighthouse on a dark night far away from civilisation. It’s like advertising, but instead of making you feel inadequate and wanting to inexplicably buy something, you find yourself angry and questioning your values. Regardless, a lot of the information had already trickled down. Interest is king. Use it to your advantage.

The Capgemini DCX team published an invite to Vercel’s new shiny AI tool V0. The presentation took place at the Capgemini London office, I was aware of Capgemini’s partnership with Vercel, and this project was ripe to give it a play.

What I learnt from working with Data Scientists on AI projects: your overarching prompt can go a long way in setting you up for success. Vercel’s V0 has a setting for your whole project. This is where I told it what I wanted from it – how I wanted it to code, what stack I was using, and other bits. My prompt looked like this:

For all the code you produce, please adhere to the following best practices. If you need to deviate from them, please tell me before you generate anything as we are trying to save energy and usage. I will provide context as instructions as follows:

  • Clean Architecture: Separation of concerns between UI, business logic, and data access
  • Type Safety: Comprehensive TypeScript types for all components and functions
  • State Management: Zustand for efficient state management
  • Form Validation: Client-side validation with detailed error messages
  • Error Handling: Custom error classes and consistent error handling
  • Security: CSRF protection, secure cookies, and input validation
  • Accessibility: ARIA attributes and keyboard navigation support
  • Responsive Design: Mobile-friendly UI components
  • Use Material UI for building the app. Use Ubuntu for headings and Roboto for body copy.

I originally had a much larger prompt, but that got too confusing for V0 and I had to cut the chaff and thin it out. I used a web-based LLM, the famous one, the one and only, ChatGPT, to slim it down a bit. The V0 context window is not that large. It is good practice to keep your prompts short and actionable. One step at a time. Try not to give the LLM too many complicated things to do at once; this is where LLMs will fall over for now. You also kind of need a language degree to use this thing, in that you need to be able to communicate and explain what you need concisely, so make sure you are very specific. Say jump and it will jump… most of the time. Also a good idea to tell it what not to do or to only do this, only look at this.

Back to V0, I was very impressed with its rendering-from-design capability using Figma URLs. However, if the design includes random colours and artefacts, the LLM can include these or add all your styles as overrides of the theme! I had to restart three times as design and theme colours were updated! The login screen was still on the old design, so all my subsequent pages were overrides of the original theme. Just because it works doesn’t make it right. The perfectionist in me was having a meltdown! How to deal with design changes in an agile way? In short, you can’t – well, not with this tool yet unfortunately, as you cannot upload your project back to the online system and say “update my file to the new design” when your project gets too big. Not very agile, methinks. The ecosystem is new-ish, so watch this space. I wish there was a plugin that you could have with your IDE, kind of how Dreamweaver used to work in the mid-2000s, pushing and pulling updates from the server. As of 14 May 2025, this is V0’s number one thing that everyone requested – well, a feature that will sync with your GitHub codebase, so that’s a start to get things cohesive.

I did like that V0 intuitively knew to do things and interpret the design in ways I would not think about – things that I later thought “I’ll keep that, thank you very much”, and could you also include this or that? V0, under the hood, uses many models to build its output. My feeling here is: make sure your designs are machine-readable, and by that I mean be pedantic, specific, and make them sharp-sharp. The better and more precise your designs are, the better your results. Again, this is not very agile, and I am starting to see others in the dev community starting to poke a stick at agile, asking if it’s still relevant – especially in this vibe coding era of prompt-driven development. I would also like to see UX and UI designers actually explain functionality in their designs so that these can be part of the prompt for the feature.

Prompt-driven development is exactly what it sounds like. You prompt, you see what comes out, and then you prompt again until you are happy with it. Coined by Andrew Miller on his Andrew Ships substack as follows:

  1. Developer receives requirements
  2. Developer breaks requirements down into a series of prompts
  3. LLM generates code for each prompt
  4. Developer reviews LLM generated code
  5. Developer submits all changes for review
  6. Another developer reviews and merges changes

Prompt Driven Development life cycle

GenAI can help people learn and upgrade their skills too. This is very useful with reasoning models like Claude Sonnet and now Gemini, apparently – but I think that was a ruse to get me to switch. These models can be picked from the dropdown and come packaged with your GitHub Enterprise licence. The model will explain what it is doing and why. Cline model selection I have used a prompt as follows:

Please help me write the code and explain what you are doing and why to help me as a JavaScript developer understand what you are doing, and point out any differences that you feel I may not understand or that would be different in Python programming.

It’s magic when you are trying to port over your skills and learn something new.

I plugged in my instance of GitHub Copilot into Cline. Cline is a plugin that lets you look at your whole project, not just the files that you tell it to look at. So in more cases than not, I used Cline. You have a dropdown list where you can choose what model you want to use. In the overall settings, you can add your pre-prompt – the thing you always want to do – your master prompt. There are two files in your project that control this: .github/copilot-instructions.md and .clinerules

the cline tab in VS code

I also found that giving Cline full read rights made it easy to concentrate on what changes it made; this was super useful later on when I had to debug or fix issues manually the old skool way. auto read feature

It is important to read the commentary as it’s reasoning. You will not learn anything new if you don’t, and you also want to know what it did and why. Read the code too. Make sure you understand what it does. This comes in useful later when the project is too complicated for an LLM to solve and you have to get down and dirty with the code. There is no point in just accepting everything, as it may just hurt you later. You need to understand what it did.

If you have accepted something and then decide that you don’t want to go down this route, you can always revert to a previous state; much like Git, there are checkpoints. I will warn you though that if you don’t accept the answer and wish you had, manual intervention may be needed.

checkpoints

On the whole, having an LLM was so useful in getting stuff out the door for this project and doing the boring stuff that I would have procrastinated away as much as I could have done otherwise. So useful for learning new programming languages. It’s the friend that you love for their knowledge and patience – that never gets tired of helping you figure stuff out!

Sometimes the LLM got in the way and made me doubt my sanity. For example, Material UI’s <Box> element could not take on a class name implemented via ThemeProvider; convinced it was an error on my part, after much debugging (and frustration), I found out through Next.js Material UI bug reports that <Box> does not accept class names! An LLM won’t tell you this; sometimes you have to do it the old-fashioned way. In short: treat an LLM like a hooooooman – check its work makes sense; ask clarifying questions; understand what it’s doing before accepting its suggestions wholesale. If all else fails, make it make sense. That’s what we are paid for!

Prompt Driven Development was originally published by Capgemini at Capgemini Software Engineering on May 23, 2025.

https://capgemini.github.io/ai/prompt-driven-development
Is your company ready for AI?
DevelopmentAI
Show full content

As a consultancy, we are expected to have strong and well-formed opinions on all new technologies arriving on the market. So we often get asked, “What should I be doing with AI? Should I be using it? What should I do and what should I avoid?” And the answer is, of course, “It depends”! There are so many factors behind whether or not AI solutions are right for a business. So, a good place to start before trying to answer this question is to be goals-oriented; to know where you are, as a business, and to know where you want to go. Then you can figure out whether any form of AI can be beneficial.

To this end, we are creating a Capability Matrix to help companies check they have the necessary prerequisites in place to benefit from AI. This post will consider Generative AI in the Software Development Lifecycle; as this is my personal speciality. So, to begin, a couple of definitions:

Generative AI

Wikipedia says Generative AI is “a subset of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data. These models learn the underlying patterns and structures of their training data and use them to produce new data based on the input, which often comes in the form of natural language prompts.” Digging into this a bit deeper, “new data” is slightly misleading; in reality the model will return a sort of “word salad” (or picture salad) of data it has seen before, mixed up according to a probability algorithm that will judge whether this is a highly likely answer its data might provide to this prompt. The more you know about how Gen-AI works, the more impressive it is that it ever comes up with anything useful! But, come up with useful things it does. Sometimes.

The Software Development Lifecycle

This means the end-to-end process of creating software, from identifying where technology could assist or improve a process, to designing an application to fulfil this business need, to building and testing that application, to deploying, monitoring and supporting the application. Since the adoption of Agile and DevOps processes, this is an iterative and automated process that is becoming better understood the more software we build.

Where to Start

As with any new or disruptive tooling, evaluation and introduction of Gen AI into the software development lifecycle must be approached in a logical and structured way, and must have provable results. We have identified five vectors to consider within an organisation when considering Gen AI tooling:

AI Capability Matrix

1. The Metrics vector: Capture your starting point

There is a lot of sales pressure on companies to purchase tools such as Github Copilot, which integrates into a developer’s IDE and gives them “useful” autocomplete suggestions for the code they are writing - these suggestions can be contextual based on the files the developer has open and the history of what the developer has been up to recently. There is also a chat window, to save the developer context-switching out to a browser or other tool if they want to search for help. Github proudly claims 55% faster task completion for its users, and costs a mere $39 a month per enterprise user. Then, there is Microsoft Office 365 with Copilot, which promises to transform the way you work for just $30 per user per month.

The only way to tell whether or not the tools have improved productivity, and hence to prove the return on your AI tooling investment, is to capture metrics on your current productivity by examining your current working practices. You can then run trials and proofs-of-concent with Gen AI tooling to look for changes in these metrics. There are many frameworks available to help you calculate your current performance - from good old-fashioned agile velocity to the more sophisticated DORA, HEART and SPACE frameworks.

2. The Process vector: Identify bottlenecks and pain-points

Once you have a handle on your current productivity metrics, you can start to identify if there are areas that need improvement. Skipping this step can waste considerable time and resources - for example, if you have accelerated your automation test scripting process by using Generative AI to write tests from your JIRA tickets, but in fact there was only one automated script being written per sprint and your actual delivery bottleneck is a lack of user experience feedback to redesign your front end, you are not going to have any impact on your end-to-end software delivery time.

There are other factors in focussing on a particular stage of the software development lifecycle, for example if a step is particularly onerous then even if it isn’t a time-based bottleneck you may be wasting money on high attrition rates for that role! Again, having the right metrics to understand your current process is key.

3. The Security vector: Consider your Data

Data sharing considerations can be a major blocker when choosing Gen AI tooling. It is certainly easier, quicker, and probably cheaper, to access pre-trained large language models (LLMs) as a service via cloud vendors such as Microsoft, OpenAI, AWS, but even if you do not share contextual files and data with the service, you will still need to send your search queries and if you are required to completely abstract the query to avoid sharing sensitive data, the efficacy of the Gen AI tooling will be significantly reduced.

It is also possible to host your own LLM and avoid internet access entirely. It is not, however, straightforward to connect tools such as Github Copilot to your own LLM so you may be restricted in how effectively you can use such tools if you host your own LLM.

Some industries may be limited by legal frameworks as to whether they can send their data over the internet to a cloud-hosted model, and some business leaders may be nervous about the implications of sharing data.

4. The Cultural vector: Train your Employee Base

Even Generative AI toolsets need to have user uptake in order to be effective. Just installing the tools alone is not sufficient, users must be trained not only in HOW to use the tools, but also WHEN to use them (and when not to use them - for example, yes Github Copilot is quite good at writing unit tests to increase coverage of existing code bases, but you should not drop TDD as a practice because of this). Collecting and sharing best practice are key, and you also may need to encourage an “AI-First” approach to daily tasks.

If a company’s employees are going to use Generative AI, it is important that they have at least a high level understanding of how it works. The MOST IMPORTANT lesson is that answers returned from Generative AI cannot be considered “correct”, and so if correctness is required the AI output MUST be thoroughly evaluated. Most mainstream AI tools do present a warning upfront that responses must be analysed, and if employees understand a bit more about the word-salad aspect they will have a better understanding of why the warning is there.

There are ethical issues that people will want to discuss and evaluate - is the purpose of this AI tooling to replace enjoyable human roles? Usually the answer to this is “no”, but it is certainly something a company will want to be open about and provide training/coaching/support for employees.

There are also sustainability issues to debate - how expensive are AI calls, in monetary or carbon terms? This is a complex question to answer - for a start it depends on whether you host your tooling or if it’s cloud-hosted/shared; it depends on whether you have a bespoke trained model (training is compute-intensive and very expensive) or if you are using a pre-trained model. Companies should have an idea of this and should, again, make the information transparent to employees.

5. The Tools vector: Evaluation techniques

This vector isn’t specific to Gen AI for software engineering - for many processes there are competing software tools on the marketplace, and businesses need a strategy by which to identify the best one. The information to help evaluate available tools will come out of the other vectors mentioned here - cost and data considerations; and which process the tool needs to support.

Capgemini have significant expertise in the software selection process, across multiple industries, using a step-by-step evaluation process from identifying candidate solutions to finalising vendor negotiations. Contact us to find out more.

Is your company ready for AI? was originally published by Capgemini at Capgemini Software Engineering on May 16, 2025.

https://capgemini.github.io/ai/ai-capability-matrix
A Green Software Engineering Playbook
DevelopmentEngineeringInfrastructure
Show full content

Defra is a long-standing client for Capgemini, with both organisations sharing a commitment to sustainability. Recently we have been focussing on whether we can introduce “Green coding standards” to our teams. Whilst there can be benefits from developers focussing on making each line of code “greener”, much greater wins can be achieved from using best practices to minimise the infrastructure and resources used in the software development lifecycle. Listed below are the best practices that our teams follow at Defra to help make the application estate as sustainable as possible.

Prerequisites

Here we list a few common software engineering best practices that are essential to have in place before following this playbook.

  1. Automate all infrastructure configuration. Having infrastructure defined as code, so that it can be created and recreated repeatedly, not only benefits application stability but allows unused infrastructure to be torn down with confidence that it can be restored.
  2. Build in observability across the lifecycle. There are certain “red flags” that you can mark as risk factors to always watch out for, such as:
    • Test environment infrastructure costs are higher than production
    • Builds are taking longer than an hour If you can set up monitors / alerts for these flags, it gives you a good head start in addressing them when they occur.
  3. Prefer cloud deployments. If you can delegate hardware efficiencies to your cloud provider, they are most likely the specialist in using their infrastructure to maximum efficiency. This need not imply cloud vendor tie-in; we can abstract to layers such as Kubernetes-as-a-service and move, or even balance, applications between clouds without too much difficulty.
Software Architecture Best Practices

Thinking sustainable begins at application inception, and it is also at inception that the biggest efficiencies can be realised.

  1. Don’t solve problems that aren’t there. If there isn’t a performance issue, don’t architect as if there is one! You may not need that cache/queue/resilience level in the infrastructure.
  2. In the cloud, prefer function-as-a-service, then a PaaS that can scale-to-zero, then VMs. There are often good reasons not to use function-as-a-service, but even so you should be looking for the possibility to scale right down to zero for your hosted applications rather than having a constantly-running virtual machine (that you are constantly paying for).
  3. Avoid data duplication. Having a single store of data and making it readily accessible can be more sustainable than having multiple copies and having to keep them in sync.
  4. Minimise message traffic. Passing excessive data over interfaces can also reduce sustainability. Keep messages small and only pass data when necessary.
  5. Consider message and security protocols. Some can be more energy-efficient than others, depending on handshake complexity and encryption levels – although there is often a trade-off to consider. For example, for JSON Web Tokens (JWTs) the cost of creating and validating the JWT is high as it uses complex encryption, but you only need to make one authentication call – whereas with OAuth you pass a simple token but must make regular authentication calls to validate it. Similarly, SSL uses expensive encryption but is essential for securing internet traffic so can only be safely avoided for communications within a closed network.
  6. Use stateless services. Managing state introduces complexity and cost – avoid as much as possible. Event-based models can help.
  7. Set document retention policies for all supporting documents.
Software Development Best Practices

It is usually true that if you write quality code, you write the most sustainable code possible, because the best practices of both disciplines align. There are some development team best practices that we can highlight though:

  1. Keep It Simple. KISS is one of the acronyms that often appear in “clean code” guidebooks, and it plays directly into sustainability. Applications can become over-complicated by trying to code for scenarios that will never exist. Keep a “MVP” mindset and write as little code as possible.
  2. Keep the team Informed. Our project recently calculated that to run a single branch build in the cloud and complete our branch integration test suite costs £40 per day. Once developers know this, they are far more likely to be conservative in the builds they run.
  3. Consider your programming language. Each language has a different energy efficiency protocol – for instance a study showed that a set of problems solved in JavaScript used more energy than the same problems solved with Java.
  4. Use appropriate data structures. Consider what your data is being used for and choose a data structure to match its use case.
  5. Write self-documenting code. Well-written code, stored in a repository in deltas with each commit linked (e.g. via naming convention) to an issue tracking system where requirements are captured, and BDD-style test scripts that copy the language of the requirements to prove delivery, can alleviate the need for a lot of expensive documentation.
  6. Keep tasks small. Successful backlog management and refinement leads to manageable development tasks where the developer can focus on quality code, and the code reviewers can focus on improving the sustainability of the code.
Software Testing Best Practices

Whilst automating tests is essential to enabling frequent low-risk releases, there is a danger that automation suites become overly large, repetitive and expensive to run. Tests should be reviewed and groomed throughout the application lifecycle, and test strategies should be practical and minimal.

  1. Keep to the Testing Pyramid. Unit tests are much cheaper to run as they do not require the entire application architecture to be up and running, the way that integration tests do. All functional testing should be confined to unit testing, with pact testing used to confirm dependencies and interfaces between systems.
  2. Fail early. If your unit tests fail, you don’t want to go on to run further tests. So any automated pipelines should stop on the first failure – even if stages are running in parallel.
  3. Only test on change. Tests should be cadenced with application changes, not with calendar dates. There is no need to re-run performance tests weekly if nothing in your environment has changed, for example!
Software Deployment Best Practices

Here we discuss all things Platform - assuming cloud deployments and full automation as per the prerequisites.

  1. Everything Ephemeral. If infrastructure isn’t being used, tear it down. This is particularly relevant for test environments - we often have many test or “live-like” environments which are left running overnight and at weekends. Review whether this is really necessary. Sometimes it is – you may have certain tests running out-of-hours, but often it is not. This applies to support infrastructure too – build pipeline machines should be created when needed and torn down afterwards. Capgemini can provide templates for completely ephemeral build environments, such as CREATE and, for Defra, ADP for Azure DevOps pipelines.
  2. Automated tear-down. Developer builds and other automated test environment builds should automatically take themselves down once their automation suites have completed.
  3. Review your cloud service plans regularly. Your initial best estimate of what “size” of hardware applications will need should be reviewed over time. Often, we only review when we want to scale up – consider regular reviews with a view to scaling down when hardware or services are not being utilised.

We will be evolving this list as we work with our clients, adding best practices that we have seen in use. As usual with any drive for change, awareness is key - having a focus on sustainability for everything we do will help draw out further ideas on how we can improve. Don’t hesitate to add further suggestions!

A Green Software Engineering Playbook was originally published by Capgemini at Capgemini Software Engineering on March 28, 2025.

https://capgemini.github.io/sustainability/green-se-playbook
Transparent borders for element states
DevelopmentCSS
Show full content

It’s quite a common design requirement to have a border on some states of an element but not others. For example a set of tabs where the currently active tab is highlighted with a top border, or a button where a border is added on hover.

The problem is that if you add a border to an element, that changes its size, so if you add a border on one state of a component, then the size will change when the state changes, as you can see from these examples:

One simple and elegant solution is to use a transparent border on the other versions of that element, and only change the border colour when the state changes.

This feels like a very old and quite basic piece of advice, so I assumed it must have been written about before, but the only example I’ve found is in this stack overflow answer.

There are other ways to approach this, like using an inset box-shadow, or changing the padding to offset the size of the border, but to me that feels unnecessarily complex, and unintuitive enough that I’d want a comment in the code to explain why.

Transparent borders for element states was originally published by Capgemini at Capgemini Software Engineering on March 07, 2025.

https://capgemini.github.io/frontend/transparent-borders
Creating a Retrieval Augmented generative AI chatbot
AICloudBots
Show full content

I’ve said before that using the cloud feels to a certain age of developer like a kid in a sweet shop. Talking to a colleague of a similar age recently about how he wrote his own message bus in C, we agreed the current generation don’t know how lucky they are to use this software as a service via few clicks on a beautifully laid out portal! This week I was amazed again what can be done in a few clicks, whilst taking part in a Capgemini hackathon. The brief was short and open – “build an app with AI”, and the toolset was $300 credit on the Microsoft Azure portal.

What my team decided to build was a RAG – a retrieval-augmented generative AI solution. We have an email group that’s many, many years old and contains a lot of company knowledge from people who have long-ago retired, so we decided for our solution to “resurrect” them by using the mailbox files as a data source for our RAG. We based our solution on this Azure sample application, and architected as below:

Azure architecture diagram for RAG

For a bit of fun, we also used Gen AI (Microsoft Copilot) to write our hackathon presentation – given a description of the architecture it came up with this masterpiece, which I kind of like:

AI-generated architecture diagram

Data source

First step was to get the mailbox file into JSON format so it could be uploaded to the Azure portal. Next we had to learn about “vectorisation”.

Many of us are starting to become familiar with vector datastores and huge numbers of vectors being quoted – for example the “ada 2” embedding engine that we used produces vectors with 1,536 dimensions. I always describe vector dimensions as “ways to represent the word cat”, for example with one dimension a cat is a cute but annoying furry animal; with 1,536 dimensions it can also be a cat o’ nine tails; a shortcut for concatenation; any subspecies of the genus Felis; a play; a musical; the arch-enemy of a dog; a cartoon… OK I won’t list 1,536.

It turned out to be pretty easy to vectorise our data. We stored it in an Azure storage blob so the cloud could be granted read permissions on it, we spun up an OpenAI instance and deployed our “ada 2” embedding model in it, we created an Azure AI search index (one of my favourite Azure products) and chose “load and vectorise data.” This wizard allowed us to specify our data source and our embedder, and then it went off and did some magic, and then we had vectorised data! As a test, I searched the AI Search index for the term “BBC”. I knew it wasn’t in the data source, but I knew there was a long debate in the data source about a Fawlty Towers TV series, and sure enough the vector returned, with a suitable degree of confidence, the email chain in question. Fantastic.

What is the process carried out by the embedder? Well, it turns the words into numbers, based on how it has been trained, and there are calculations that can be performed on these numbers to figure out how “semantically similar” two words are – so for example in English the vectors for the words “coder” and “developer” should have very similar outputs from the calculation. When I look in my AI search index post-vectorization, I can see my initial data broken down into chunks, and I can also see arrays of arrays of numbers for each chunk. There are different embedding models you can use to vectorise your data, which have been trained on different data sets to vectorise words – here, specialisms can come in such as a data set of medical terms that would have been trained to recognise how they link together. The language of the text is also presumably a very important factor in training the embedder! We now already have a very powerful machine-learning-augmented search on our data, all we really need to do is add an interface to it. As this is an AI hackathon, we’ll add a natural-language chatbot interface.

Chatbot Interface

You can query the OpenAI instance that we’ve spun up for our hackathon in natural language by installing a “chat module” that will do this automatically, but that will ask the LLM, which I guess has been trained on a huge amount of internet data, directly – and that’s not what we want. For a RAG, we want to restrict the queries to a data set that we know contains the answer we want for our domain. So we build an “orchestrator” application to set in between the user and OpenAI, intercept queries and then use prompts to force the LLM, with extreme prejudice (and maybe some bribes) to only use the data we want and only reply in the way we expect.

Our user interface is a simple one-page application with a chat box you can type into and submit to our orchestrator layer. The first step for the orchestrator is to use OpenAI to translate the user’s question into a better search term for our AI search index. So when the user sits on the keyboard and writes a misspelt, badly phrased query like “wot r the main chacaraterts in that SOAP with John Cleese in it”, we can clean it up a bit before we search our vectorised data. OpenAI is good at this – it’ll return something like “Fawlty towers main characters” (although it does prefer to spell “Faulty” correctly!) So we build that API call first. The response, if there is any, is then fed into an API call to our AI Search index, which will return the data it has matched plus a probability score that the data is relevant. There’s scope here to filter based on probability score if we’re getting inaccurate responses. Then we take this data and craft another OpenAI call, this time to the chat endpoint. We provide the data we’ve received from AI search, and a strongly worded prompt along the lines of “You are a helpful assistant, you will return a paragraph and bullet points of relevant data, you will ONLY use the provided data to generate your answer, if you cannot find the answer in the provided data then say ‘I don’t know’” and so on. Despite training courses telling you otherwise, prompt engineering is a bit of a dark art, so you may need to experiment with the prompt a bit to get better results.

The response from this query should be in natural language and contain answers only from your data set, so it can be directed straight back to the user, and voila! Your very own RAG application.

Cost Considerations

So what does it cost? From an Azure perspective, you need to host your static web front end and your orchestrator app – there are various ways to do this, you could use a Function App for the orchestrator and a “Static Web App” for your one-page front end, that might be the most cost-efficient.

You only need to deploy the OpenAI embedding module whilst you vectorise your data, but you do need the OpenAI chat interface which is charged per API call so that would be 2 calls per application use. You could skip the “translate-my-question” call if you really wanted to reduce this – although the pricing model does not give much benefit for this.

There is a handy in-portal calculator to help you figure out your OpenAI costs based on your created architecture in Azure, for my hackathon estate which included an OpenAI chat the estimate was $1,400 per month per PTU (this is based on using the chat feature 24 hours a day) and I’d apparently need 15 PTUs! Ouch. Lowering this to perhaps 7 hours a day for 20 working days is a slightly more acceptable $4,200 a month, but still – costs to be very wary of.
You need your AI search instance; this is charged per GB of data stored rather than per API call so the cost really depends on the size of your RAG data set.

Security Considerations

Whenever you use Software as a Service, you need to be aware of what you are sending to the vendor and what they are contractually able to do with it. In this case, our AI search data is contractually secure in the index, but we also need to look at the contract for the OpenAI calls – for example, is Microsoft allowed to keep a copy of our query data? There’s a terms of service page to help you understand with lots of promises on things Microsoft won’t do, but whether this is acceptable will be very specific to your use case requirements so get your reading glasses on.

Then, as for any cloud architecture, you want to be sure that each component has the minimum viable permissions to access each other component and that your internet-exposed APIs and websites are protected by login. For this architecture, we could make the Azure storage blob read-only, give the Azure OpenAI instance contributor permissions to the AI search service, and give the orchestrator service read permissions to the AI search and query permissions to the OpenAI chatbot.

Conclusion

It was pleasingly straightforward to create this RAG, although I am not certain that the cost of adding the chatbot front end significantly increases the value of the product; perhaps just the vectorised data is enough to increase the usefulness of a data source by making it contextually searchable? I think I did succeed quite well in resurrecting all those retired employees though. All their conversations about Fawlty Towers and the value of the Capgemini car lease scheme have been brought back to life in a very convincing and accessible way!

Creating a Retrieval Augmented generative AI chatbot was originally published by Capgemini at Capgemini Software Engineering on February 28, 2025.

https://capgemini.github.io/ai/building-a-rag
Accelerating AWS Native Development with LocalStack & Docker: A Real-World Case Study
AWSCloudDevelopmentTesting
Show full content

In this blog post, I’ll walk through how using LocalStack with Docker and Docker Compose helped us speed up development and testing, making the entire process more efficient and seamless.

When building modern cloud-native applications on AWS, having a reliable and cost-effective approach to development and testing is essential. For this project, we utilised LocalStack, a platform that simulates AWS services locally. This enabled us to work more efficiently, speed up development, cut down on costs, and quickly test integrations without constantly interacting with the actual AWS cloud. LocalStack, along with other components of the project, was deployed as Docker containers using Docker Compose for orchestrating the entire environment locally.

The Challenge: Cloud Dependency Slows Development

As many developers will attest, working directly with live AWS services can become costly and slow. Every API call, infrastructure change, or even basic testing, requires interaction with the cloud, which can introduce delays. On top of that, maintaining isolation between development and production environments can be tricky and costly when dealing with real AWS resources.

In particular, this project was fast-paced and required frequent iteration between components. This constant cycle of making changes, pushing updates, and testing in a live AWS environment was prone to introduce a lot of friction. A local alternative to AWS was essential for reducing the turnaround time between code changes and testing. The combination of LocalStack with Docker and Docker Compose provided the perfect solution.

The Solution: LocalStack with Docker

LocalStack — “A complete, localised AWS environment where developers can build, test, profile and debug infrastructure and code ahead of deployment to the cloud.” By deploying LocalStack in a Docker container, we were able to spin up a local version of AWS that was fast, reliable, and cost-free. Docker made it easy to manage and isolate the LocalStack environment, while Docker Compose orchestrated the entire environment locally, allowing us to run all components — front-end, back-end, mocks, and infrastructure, together in one seamless, local environment without the latency, cost, or need for an internet connection.

Project Overview

The project consisted of four key components:

  1. UI Component: The frontend / user interface of the application was developed with Node.Js and Nunjucks used as the templating engine.
  2. Backend Component: The APIs and business logic built with Java and Spring Boot framework
  3. Mock Component: A HTTP mock service based on MockServer used to simulate third-party or Upstream APIs.
  4. Infrastructure/Environment: The AWS environment was defined in code following the Infrastructure as Code (IaC) paradigm, using Terraform to provision various AWS services.

Basic High level illustration of project setup with local stack, docker and docker compose

In this project, we used SQS for messaging, S3 for object storage, and ECS for container orchestration, Cognito for user management, Cloud Watch for monitoring and alerting, among other AWS services. LocalStack allowed us to simulate these services locally, cutting down on our cloud costs and speeding up development. With Docker and Docker Compose, all services were integrated into a single, local environment that closely mirrored the cloud environment. For instance:

  • Messaging flows were tested by sending and receiving messages through local SQS queues. This allowed us to validate the flow of messages from one component to another, ensuring that the message-handling logic worked as expected.
  • We uploaded and retrieved files from LocalStack’s simulated S3 buckets, ensuring our storage logic worked as expected without needing to interact with the actual AWS S3 service
  • User authorisation and authentication was tested through local Cognito User Pools
  • Our back-end API integrated with Mock services simulated using MockServer running in Docker, allowing us to test data storage and retrieval workflows without needing the live services deployed on AWS.

All the code for the various components were managed with Git & GitLab version control, and we also utilised GitLab’s CI/CD for our build and deployment pipeline management.

Refer to AWS Service Feature Coverage for an overview of the AWS APIs and services implemented in LocalStack

LocalStack and Docker enhanced our project in the following ways: 1. Containerised Deployment

By containerising LocalStack, our application’s Backend services, Frontend service, mock components, and other services, we could deploy the entire stack on any developer’s machine with minimal setup. All that was required was Docker and Docker Compose. This approach offered consistency across different environments and eliminated the “works on my machine” issues that often arise when different developers have different local setups.

2. Docker Compose for Orchestration

We used Docker Compose to orchestrate our multi-container environment. Each service—whether it was LocalStack simulating AWS or our front-end and back-end components—was defined in a single docker-compose.yml file. With one command, we could spin up or tear down the entire environment. This orchestration allowed us to develop, test, and debug the entire system on a local machine without having to interact with AWS cloud services.

Here’s an example of the Docker Compose setup. Please note, this is just an illustration and not the actual configuration.

version: '3'
services:
  localstack:
    image: localstack/localstack
    ports:
      - "4566:4566"
      - "4571:4571"
    environment:
      - SERVICES=s3,sqs,lambda,dynamodb
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"

  backend:
    build: ./backend
    depends_on:
      - localstack
    ports:
      - "5000:5000"

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"

  mock-service:
    build: ./mock-service
    ports:
      - "4000:4000"

This setup allowed all components — LocalStack, the back-end, front-end, and mock services to run in harmony, facilitating rapid development and easy integration testing.

3. Faster Development Cycle

By running services locally, we no longer had to wait for deployments or API interactions to happen in the cloud. We could iterate over changes rapidly, test out new features, and deploy code without worrying about cloud costs. For example, instead of sending messages through live SQS queues and interacting with S3 storage on AWS, LocalStack let us simulate these services locally. This allowed us to run and test our messaging workflows much faster.

4. Easy Integration Testing

One of the biggest challenges in modern cloud applications is ensuring that various components — front-end, back-end, messaging, storage; are working harmoniously together. LocalStack made this process seamless. Since we could emulate all the AWS services locally, we could simulate how our front-end communicated with the back-end, how our back-end processed data using SQS and stored files in S3, and even how our infrastructure interacted with external services. This allowed us to test integrations between various parts of our system without needing to push anything to the cloud.

5. Reduced Costs

The ability to test AWS-dependent services locally means you’re not accumulating unnecessary cloud charges. In our project, running constant tests on live AWS services would have significantly increased the budget. LocalStack helped us minimize these costs by simulating the AWS environment on our local machines without any cloud interaction. This is particularly useful for small teams or projects with limited budgets.

6. Simplified Infrastructure Setup

LocalStack also simplifies the process of setting up and tearing down infrastructure. Since everything runs locally, we could quickly spin up and shut down environments for testing and development, reducing the overhead involved in managing cloud resources.

7. Parallel Development

With LocalStack in place, our team could develop multiple components simultaneously. The front-end devs didn’t need to wait for the back-end APIs to be fully functional in AWS; they could test against the locally mocked services. Similarly, the infrastructure devs could configure and test various AWS services locally before pushing changes to the cloud.

Downsides of LocalStack: 1. Limited Feature Support

Not all AWS services and features may be available or fully supported in LocalStack. Also, new or complex features may not be fully implemented, which could impact testing and development.

2. Compatibility Concerns

LocalStack may not always be fully compatible with the latest versions of AWS SDKs which could lead to delays in adopting new features.

3. Configuration Complexity

Configuring and setting up LocalStack may require additional effort and technical know how. It may not be as straightforward as using the actual cloud services, especially for beginners.

Conclusion

In summary, LocalStack significantly accelerated our project’s development and testing processes. By simulating a local AWS environment, it allowed us to quickly iterate on our front-end, back-end, and infrastructure components without incurring high cloud costs or waiting for slow cloud interactions.

I would highly recommend incorporating LocalStack into your workflow for anyone working on projects that rely on AWS services. Not only will it make your development process faster and more efficient, but it will also allow you to test AWS integrations with greater confidence, all while keeping your costs low. If you haven’t tried LocalStack yet, give it a shot - you’ll likely find it as invaluable as we did!

Accelerating AWS Native Development with LocalStack & Docker: A Real-World Case Study</a> was originally published by Capgemini at <a href="https://capgemini.github.io">Capgemini Software Engineering</a> on November 08, 2024.</p>

https://capgemini.github.io/cloud/accelerating-aws-native-dev-with-localstak-and-docker
Use Cases for AI
AIMachine LearningDevelopmentTesting
Show full content
Use Cases for Generative AI

For the past year, almost every week the Software Engineering world is asked about Artificial Intelligence (AI). “What should Capgemini’s AI offerings be?”, “Aren’t you worried about the role of Capgemini software engineers now that AI can write code?” Specifically in these cases, the questions are around Generative AI. To put that in context, here is a recap of some terms used to describe different implementations of AI.

Machine Learning

A nice dictionary definition of machine learning from Oxford Languages is: “the use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data.” The difference between a basic decision tree and machine learning is that with machine learning, a person didn’t sit down and define a set of if-statements for the machine to follow to make decisions (as used, for instance, in the NHS Map of Medicine from 2007). Instead, they provided a set of inputs and expected outputs and let the machine’s algorithms figure out how to get from input to output. You could describe it as a bottom-up approach to problem solving rather than a top-down approach. This evokes the differences between Newton and Leibnitz in their attempts to describe the physical world in the 17th century. Newton’s laws are built more like a decision tree, describing the results of forces on objects, whereas Leibnitz’s “monads” are more like the algorithms in machine learning where small, simple rules build up together to create the complex forces we see in action. They are both very useful tools in describing and learning about the physical world but may result in different predictions of varying accuracy. Machine learning algorithms are loosely based on the neurons in the brain. Each “artificial neuron” is a function that receives one or more inputs, applies weights to these inputs, and sums them to produce an output. Chained together into a “Neural Network” these small functions can complete amazing tasks such as image recognition, text generation and predictive analytics. Machine learning is incredibly useful and can benefit a wide range of industries from healthcare to travel to finance.

Generative AI

Dictionaries are not so hot on defining generative AI. Perhaps the best example is from Merriam Webster: “artificial intelligence that is capable of generating new content (such as images or text) in response to a submitted prompt (such as a query) by learning from a large reference database of examples”. The important difference here is the generation of new content. There are still machine-learning algorithms at play, but the ask is that they produce something new. The output of a generative AI query isn’t pulled from a list, it isn’t copied from an article. It is created by the neural network. You MAY get a text answer that is the same as an existing article, but it will have been probabilistically generated using an algorithm that, in simple terms, says “what word is most likely to come next in this phrase”, for example. Asking the same question twice will not produce the same output. With machine learning, the output you get is usually a probability – for example, a handwriting recognition system would give a probability percentage that a given input image contains certain words. As such, it isn’t ever completely wrong, because it doesn’t give a definitive answer. Generative AI will look at the various probable outputs and choose one by applying a separate decision algorithm. In this way, it is possible to get answers that are just plain wrong.

General AI

Wikipedia says General AI “is a type of AI that matches or surpasses human capabilities across a wide range of cognitive tasks.” To put it simply, General AI does not exist. We don’t currently accurately define human capabilities, so trying to decide whether a machine can exceed them is impossible!

Examining the Use Cases

As mentioned above, Machine Learning solutions are proving vastly useful in automating complex tasks that must make deductions based on huge quantities of data. A notable example is evaluating patient CT scan data in search of cancerous growths. The algorithm does not replace the oncologist, but assists to alleviate an otherwise “repetitive, tedious and burdensome” task. The outputs are not treated as gospel truth, but help the oncologist choose where to focus their time and efforts.

There has understandably been much buzz and excitement about the amazing capabilities of Generative AI this past year, with the broader uptake of ChatGPT and Dall-e, for example. Now that the dust has settled, a closer evaluation of Generative AI use cases is required.

Remember this diagram for blockchain use cases? Perhaps it’s time for something similar for Generative AI use cases?

Should You Use Blockchain? Flow chart

The key thing to remember about Generative AI is that the answers it generates are not necessarily correct, and are not even intended to be correct. When architecting a solution that involves Generative AI, the pertinent questions are:

  • Does it matter if this generated answer is wrong?
  • Do I require the same answer to the same question if I ask it twice?

If the answer to either of these questions is YES, then you cannot use a standard Generative AI solution without some significant tuning which in itself removes the “generative-ness” of your solution.

There are still plenty of scenarios where the answer to the above questions is NO. With a focus on Microservice Development, we will look at the use cases that have been put forward for enhancing the software development lifecycle:

User Interface Design

When designing a human-computer interface to improve the user experience, there are best practices but there is no wrong answer. As such, it is a viable candidate for applying Generative AI - the role of an Agile Software Engineer in interface design is to iteratively put suggestions forward to the system user for their consideration. AI tools could accelerate prototypes, and assist the designer in breaking out of their habitual designs and trying something new. Already, design tools such as Figma and V0 by Vercel have started to put this into practice.

Automated Testing

A software test suite should allow all routes through the system to be automatically checked when a change is added, to ensure that no previous logic has been broken by the new addition. Defining “all routes” isn’t as straightforward as you may think. There is a great testing joke:

A tester walks into a bar and orders a beer.
Then they order null beers.
Then they order 99999 beers.
Then they order ytugihoij^%&*( beers.
Then they order a bear.
A customer walks into a bar and asks where the toilets are. The bar explodes.

Turning up the “temperature” on Generative AI and asking it to generate some test cases, and getting some random test cases generated which a tester might not have considered, could be useful. A similar principle is applied to testing technical architectures in Barry O’Reilly’s Residuality Theory, whereby he suggests imagining random events to test the robustness of a software architecture. For example, if you architect for the scenario that giant lizards might crawl out of the River Thames and set fire to London, your application will consequently be able to survive events such as power blackouts or train strikes! Generative AI could be used to produce those off-the-wall architectural scenarios that help improve your overall application stability over time.

Writing Code

There has been a lot published around the ability of Generative AI to write code. With applications such as Github Copilot and Azure Copilot being very easy to integrate into a developer’s workflow (or to use standalone as a pseudo developer), this would appear to be a central use case for Generative AI, and some believe that Generative AI will replace the software engineer’s role imminently. What we have seen with Capgemini software projects does not align with this, however – it is not possible to provide a junior developer with a Copilot license and expect them to perform like a senior developer, let alone enabling businesses to create software themselves without a software engineer present at all. There are a few big and obvious reasons for this:

  1. It only saves time spent writing code
    Developers spend surprisingly small amounts of time actually writing code There are various studies showing percentages between 9% and 67% of time is spent “coding”, depending on the definition, the seniority of the developer, the workplace environment. Either way, there is a lot of time that a developer spends on tasks that Generative AI code assistants are not designed to help with. Github proudly announced huge productivity gains of 55% when using Copilot, but saving 55% of the 9% of time you actually spend coding isn’t as impressive - plus, most of the examples I have seen of using Copilot focus on reducing boilerplate or writing simple games, and these use-cases are not particularly relevant to the typical task list of a senior software engineer.

  2. It takes value away from the development lifecycle
    An often-quoted example of Generative AI writing code is the creation of unit tests. The best practice when developing code is to employ Test-Driven Development (TDD), an architectural approach which forces the developer to consider their code interfaces before they even write their method implementations. Relying on Generative AI to write unit tests implies they will be written after the code, hence not following TDD practices. A counter-argument is where a team inherits a codebase which does not have high unit test coverage. Is it safe to allow Generative AI to create the tests? Again, let’s go back to the initial ask - Does it matter if this generated answer is wrong? Yes it does matter. The only thing worse than no tests is incorrect tests! It is possible to generate unit tests for an existing code framework using rules rather than Generative AI. In fact, for many Generative AI use cases this is true – a rules-based or machine learning result from a query is more useful than a generated result. What we find with many Generative AI use cases is that the useful component is very often the natural language “chatbot” interface and the way it can translate a natural language query into a search term, rather than the generated result.

  3. It’s most useful for the people who don’t want it
    It is worth considering who will find a Generative AI code assistant most useful. Senior developers are less likely to use it to generate much code, as they have already found ways to avoid boilerplate, and the difficult work of fine-tuning or bug-fixing a code base is less likely to bring useful suggestions from Copilot. It is still incredibly useful though and can save many hours of trawling the Internet. Below is a real-world example of a conversation held with Azure Copilot. The software engineer has identified an issue with the code, and uses Copilot to find a way to code a specific fix. Interestingly many of the most useful applications of Generative AI coding appear to be helping humans to write regular expressions!

Copilot writing regular expressions

As you can see from the prompt engineering, considerable knowledge is needed to get the code that is required to fix a problem in a code base. Further knowledge is needed to check whether the required answer is relevant. And user beware - if you lose your chat history in Copilot (as will happen with Microsoft Copilot in “Data Protection” mode) of course you cannot get the same answer a second time by asking the question again, and this can result in many frustrating lost hours trying to find what you briefly saw via an internet search instead! So much for time saving!

When interviewed, most senior software engineers will talk about their problem-solving mindset. They enjoy problem solving, and will actively look for problems to solve. As such, a tool which does the “fun bit” of problem solving for them is not going to have a great take-up. Regular expressions aside, it is junior developers who will be most likely to take up Generative AI and use it as a crutch. And here we must go back to our initial check: Does it matter if the code they generate is wrong? In our software supply chain pipelines we usually have built-in checks to ensure compilation, look at code quality and mandate test coverage, so some “wrong” code will be weeded out by the pipelines, but there is still a huge risk that a junior developer is shipping code the impact of which they do not really understand. Currently, hackers have to go to great lengths to get unknown code into production code-bases, now it seems we are embracing a tool that will do it for them. Care must be taken here.

  1. It’s only as good as the average of the code it learns from
    The data set used by Generative AI to produce code is often all open source code bases. This is a huge dataset and fairly comprehensive – many, many programmers have contributed to open source projects. Therefore, we have literally created the world’s most average programmer. This is rarely the target that a business sets for its IT divisions… And indeed, reports are now proving that a code base maintained with Generative AI degenerates in quality over time.

  2. It’s just another programming language to learn
    There have always been layers of abstraction between humans and machines, be it 10 strings on an abacus to represent decimal numbers, typewriter keys operating little letter printers, or an IDE auto-completing your code bracket pairs. Even Java code compiling to JVM bytecode which is turned into operating-system-specific instructions by the JVM, or NoCode/LoCode drag-and-drop integrations. Most of the text-based layers of abstraction have been called “programming languages” and in that way, is Generative AI very different? It is still part of the chain between the human ideation of the software, and the executable software itself. If we all become expert prompt engineers and lose the ability to write Javascript code from scratch, is that any different from losing the ability to write C++ code because we write Javascript and get Google’s V8 engine to turn it into C++ for us? Software engineering is still the same skill, but the language used is now the language of prompt engineering. The only difference is in the fact that the Generative AI answer varies over time and hence the process of debugging could get pretty interesting .

In conclusion, there are fewer use cases than one might think for Generative AI, and whilst it will be a great tool for developers and will probably be as game-changing as IDEs were in the 1990s, it isn’t quite the end to software engineering that people are predicting - developers, don’t go hanging up your boots just yet.

Content Generation

One of the often-touted use cases of Generative AI, within and without the Software Development community, is its ability to generate content for blogs and articles. (Perhaps it is also able to read said content and “like” it?) As an example, we can compare the content of this article with the content generated when we ask Microsoft Copilot to write it, using this prompt:

“Write 2000 words on the use cases for Generative AI in the software development lifecycle. Include a definition of machine learning, generative AI and general AI in the first section. Use a formal but friendly tone. Write with the persona of a cautious software engineer.”

Copilot

Certainly! Let’s delve into the fascinating world of Generative AI and its impact on the software development lifecycle. As a cautious software engineer, I’ll guide you through the nuances while maintaining a friendly tone.

Before we explore the use cases, let’s clarify some terms:

  1. Machine Learning (ML): ML is a subset of artificial intelligence (AI) that enables systems to learn from data and improve their performance over time. It involves algorithms that automatically learn patterns and make predictions based on examples.
  2. Generative AI (GenAI): GenAI goes beyond traditional ML by creating new content rather than just predicting existing patterns. It generates novel outputs, such as text, images, or code, based on input data or prompts.
  3. General AI (AGI): AGI refers to an AI system with human-like cognitive abilities, including reasoning, problem-solving, and creativity. While AGI remains an aspiration, > GenAI represents a step toward achieving it.
Code Generation and Assistance:
  • GenAI tools like ChatGPT and GitHub Copilot assist developers by suggesting code snippets, improving productivity, and reducing manual coding efforts.
  • However, it’s essential to recognize that GenAI isn’t a silver bullet. It complements human expertise rather than replacing it entirely. Content Creation:
  • GenAI can create documentation, blog posts, and other textual content. It helps developers communicate effectively and maintain consistent documentation. Test Script Generation:
  • Beyond user stories and test cases, GenAI can generate test scripts and scenarios.
  • It identifies edge cases, executes automated tests, and logs defects, enhancing software quality. Unit Test Cases:
  • GenAI assists in writing unit test cases, ensuring robust code coverage.
  • Developers benefit from faster test case creation and improved code reliability. Source Code Generation:
  • While GenAI isn’t writing entire applications (yet!), it can generate boilerplate code, saving time during initial setup.
  • Developers remain responsible for critical design decisions.
  1. Quality Control:
    • Always review GenAI-generated content critically. It’s a tool, not a replacement for human judgment.
    • Verify that the code aligns with best practices, security guidelines, and project requirements.
  2. Ethical Implications:
    • GenAI learns from existing data, which may contain biases. Be cautious about perpetuating bias in your code.
    • Regularly assess the ethical impact of GenAI-generated content.
  3. Balancing Speed and Quality:
    • GenAI accelerates development, but don’t sacrifice quality for speed.
    • Maintain a balance—use GenAI where it enhances productivity without compromising reliability.

Generative AI holds immense promise for software development. By understanding its capabilities, limitations, and ethical considerations, cautious engineers can leverage it effectively. Remember, GenAI is a valuable ally, but it’s our collective wisdom that truly drives innovation in the software development lifecycle. As a cautious software engineer, I’ve emphasized practical insights while maintaining a friendly tone. If you have any further questions or need clarification, feel free to ask! 😊 Source: Conversation with Copilot, 19/06/2024
References given:

  1. Generative AI and the software development lifecycle
  2. Automating the Software Development Life Cycle with Generative AI
  3. How Generative AI is Reshaping the Software Development Lifecycle: From …
  4. IMPACT OF GENERATIVE AI ON THE SOFTWARE DEVELOPMENT LIFE CYCLE (SDLC)

Not bad, Copilot! However it’s only 500 words. No amount of re-prompting and scolding would generate more. Seems you still can’t beat humans for their sheer ability to vacillate!

Use Cases for AI was originally published by Capgemini at Capgemini Software Engineering on September 06, 2024.

https://capgemini.github.io/ai/use-cases-for-ai
AWS Simple Email Service Security
SESAWSEmailSecurity
Show full content

In the evolving landscape of digital communication, email spoofing poses a significant threat to the integrity of online identities. As a hobby project, and to practice AWS and software design, I have been developing an application called MovieBooker. This application streamlines the process of booking movie tickets and managing movie programs for both moviegoers and staff.

architecture

Email Integration with AWS SES

As part of the infrastructure for the MovieBooker application, I implemented Amazon Simple Email Service (SES) to send emails when a user pays for a ticket. However, I encountered a challenge where the messages contained a banner with a “Be careful with this message” warning. This marked the beginning of my quest to unravel the secrets of email security. Be careful with this message

Strengthening Email Security with SPF, DKIM, and DMARC

In order to get rid of the “Be careful with this message” warning. I had to set up Sender Policy Framework (SPF), Domain Keys Identified Mail (DKIM) and Domain Message Authentication Reporting and Conformance (DMARC).

SPF is an email authentication standard that helps protect senders and recipients from spam, spoofing, and phishing. DKIM is a standard email authentication method that adds a digital signature to outgoing messages. DMARC helps mail administrators prevent hackers and other attackers from spoofing their organization and domain. Spoofing is a type of attack in which the From address of an email message is forged.

Determined to make my application’s emails as trustworthy as possible, I delved into SPF, DKIM, and DMARC. These three tools promised an end-to-end authentication mechanism that would transform my emails into beacons of trust.

What You Will Learn

I aim to share the valuable insights I gained while implementing a secure end-to-end email authentication system. In this article, you will learn how to harness the power of three essential tools:

  1. SPF (Sender Policy Framework)
  2. DKIM (DomainKeys Identified Mail)
  3. DMARC (Domain-based Message Authentication, Reporting, and Conformance)

These lessons were acquired through practical application in fortifying the email communication of my application. Join me as I guide you through the process of implementing these robust security measures.

Understanding the Threat Landscape

Insecure email communication poses various risks, including phishing attacks that trick recipients into divulging sensitive information, email spoofing for deceptive purposes, and business email compromise (BEC) threats involving unauthorized access to legitimate accounts. Additionally, insecure emails may damage the sender’s domain reputation, result in financial losses, and lead to regulatory compliance violations.

To mitigate these risks, it’s crucial to implement robust email security measures such as encryption, multi-factor authentication, and the adoption of authentication protocols like SPF, DKIM, and DMARC. Educating users on recognizing and avoiding phishing attempts further enhances overall email security.

Overview of SPF, DKIM, and DMARC SPF - Sender Policy Framework:
  • SPF acts as a set of rules and regulations for email servers, authorizing them to send emails on behalf of a specific domain.
  • It enables the receiver to determine whether an incoming email is genuinely sent from an authorized server associated with the claimed domain.
DKIM - DomainKeys Identified Mail:
  • DKIM adds a digital signature to outgoing emails, creating a unique identifier that can be verified by the recipient.
  • The recipient checks the signature against the public key in the DNS records, ensuring the email is unaltered and genuinely originated from the claimed sender’s domain.
DMARC - Domain-based Message Authentication, Reporting, and Conformance:
  • DMARC acts as a protocol that instructs the email receiver on how to handle authenticated emails.
  • It provides options such as acceptance, marking as spam, quarantine, or rejection based on SPF and DKIM verification results.
Benefits of SPF, DKIM, and DMARC:
  • Mitigates the risk of email spoofing and phishing attacks: Ensures only authenticated emails are accepted, reducing the likelihood of malicious activities.

  • Enhances the reputation of your domain: Implementing industry-standard email authentication practices contributes to a positive domain reputation.

  • Provides a clear framework for email handling: Reduces the chances of legitimate emails being marked as spam, improving communication reliability.

  • Aids in achieving and maintaining regulatory compliance: Particularly beneficial in industries with stringent data protection regulations, helping organizations avoid legal consequences and financial penalties.

  • DMARC protects the reputation of the sender’s domain: Provides visibility into email authentication results, preventing unauthorized use of the domain in phishing attempts and strengthening overall brand trust.

  • Contributes to the mitigation of data leakage: Ensures that sensitive information transmitted via email is secure and has not been tampered with during transit.

Establishing a Secure Communication Channel

By implementing SPF, DKIM, and DMARC, we establish a formidable defense against unauthorized email activities, offering a secure and reliable communication channel that fosters trust and confidence among users.

What is AWS SES?

Amazon Simple Email Service (AWS SES) is a cloud-based email sending service designed to simplify the process of sending emails from applications and websites. Its role extends beyond just delivering emails; AWS SES serves as a comprehensive solution for ensuring secure and reliable email communication.

Key Features:
  • Comprehensive Email Solution: AWS SES goes beyond basic email delivery, providing a robust solution for secure communication.

  • Easy Integration with AWS Services: Designed for easy integration with other AWS services, making it straightforward to implement and manage SPF, DKIM, and DMARC alongside complementary AWS solutions.

  • Seamless Integration with Authentication Protocols: AWS SES seamlessly integrates with SPF, DKIM, and DMARC authentication protocols, providing a strong framework to verify sender identity and prevent email spoofing or phishing attempts.

  • Configurable Email Sending Settings: Allows users to configure and customize email sending settings, enabling the implementation of specific security measures tailored to the organization’s needs.

With AWS SES, you can ensure that your email communication is not only efficient but also adheres to industry-leading security practices.

Implementing DKIM with AWS SES

To enhance the security of your email communication, you can implement DKIM (DomainKeys Identified Mail) with Amazon Simple Email Service (AWS SES). Use this link to set up DKIM: Setting up Easy DKIM

Set Up Public Key in DNS: Once DKIM signatures are enabled, you need to set up the public key in your DNS.

  • Retrieve the public key from the AWS SES console.
  • Add a TXT record to your DNS with the DKIM public key.

Example DNS Record:

Name: _domainkey.yourdomain.com
Type: TXT
Value: "v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBA..."

DNS record

Custom email

Record sets

These steps will guide you through the process of setting up DKIM for your AWS SES configuration, adding an extra layer of authenticity to your outgoing emails.

Setting up SPF

Before we configure SPF, it’s essential to understand the key components of an SPF (Sender Policy Framework) record. SPF is a crucial authentication mechanism that helps prevent email spoofing and phishing attacks. Below is a detailed breakdown of the components in an SPF record.

Version (v):

The SPF version is specified using the “v” tag, followed by the version number. For example, “v=spf1” indicates SPF version 1.

Mechanisms:

Mechanisms are the core components of an SPF record that define the rules for email server authorization. Common mechanisms include:

  • a: Allows the domain’s A record (IPv4 address) to send mail.
  • mx: Allows the domain’s MX record (mail exchange) to send mail.
  • include:domain.com: Permits the specified domain to send mail on behalf of the current domain.
  • ip4:192.168.0.1: Authorizes the specific IPv4 address to send mail.
  • all: Specifies the default action for servers not covered by previous mechanisms. It can be set to “allow,” “deny,” or “neutral.”
Directives:

Directives modify or refine the behavior of the mechanisms. Some common directives include:

  • redirect: Redirects the SPF check to another domain’s SPF record.
  • exp: Explanations that provide additional information in case of SPF failures.
  • exists: Checks if a specified domain has an A record, allowing for conditional SPF processing.
Qualifiers:

Qualifiers define the result of a mechanism or directive. There are four possible qualifiers:

  • + (Pass): The server is authorized, and the SPF check passes.
  • - (Fail): The server is not authorized, and the SPF check fails.
  • ~ (SoftFail): The server is not authorized, but the SPF check may still pass (often used for testing or gradual SPF deployment).
  • ? (Neutral): No explicit authorization or denial; the SPF check is inconclusive.

Now, armed with an understanding of these components, you can proceed to configure SPF for your domain, strengthening email authentication and security.

Example SPF Record

Below is an example SPF (Sender Policy Framework) record that demonstrates the use of different mechanisms and qualifiers:

v=spf1 include:_spf.example.com ip4:192.168.0.1 -all
Name: yourdomain.com
Type: TXT
Value: "v=spf1 include:_spf.example.com ip4:192.168.0.1 -all"

Create an SPF format "v={spf version}{mechanisms}{directive}{qualifier}all"

Follow this link for more information on DNS settings: DNS settings. I added this TXT entry in Route 53 for SPF support:

Simple type routing

Configuring DMARC Policies

Now, let’s walk through the steps to configure DMARC (Domain-based Message Authentication, Reporting, and Conformance) policies for your domain in Amazon Web Services (AWS) using Route 53:

  1. Log in to Amazon Web Services: Go to https://aws.amazon.com/ and log in to your AWS account.

  2. Navigate to Route 53:
    • Under “Services,” find and click on “Route 53” under “Network & Content Delivery.”
  3. Access Hosted Zones:
    • In the “DNS Management” section, select “Hosted Zones.”
  4. Select Your Domain:
    • From the list of hosted zones, find and click on the domain you want to configure DMARC for.
  5. Create a Record Set:
    • Click on the “Create Record Set” button.
  6. Configure DMARC Record:
    • In the “Name” field, type “_dmarc”.
    • Under “Type,” select “TXT” from the drop-down menu.

Example DMARC Record:

Name: _dmarc.yourdomain.com
Type: TXT
Value: "v=DMARC1; p=quarantine; rua=mailto:dmarc@example.com; ruf=mailto:dmarc@example.com"

Generate a value for the DMARC record set on this site powerdmarc

Power DMARC 1

Power DMARC 2

Add the DMARC record to the DNS record in Route 53:

DNS Record DMARC

Conclusion

The implementation of Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM), and Domain-based Message Authentication, Reporting, and Conformance (DMARC) within the framework of Amazon Simple Email Service (SES) presents a robust and multifaceted approach to fortifying email security. By embracing these authentication protocols, organizations and individuals can significantly mitigate risks associated with phishing, email spoofing, and unauthorized access.

  • SPF as the Gatekeeper: SPF acts as a gatekeeper, specifying authorized email servers and reducing the likelihood of malicious actors impersonating trusted sources.

  • DKIM for Extra Security: DKIM adds an extra layer of security by providing a unique digital signature, ensuring that the integrity of the email remains intact during transmission.

  • DMARC as the Decision-Maker: DMARC serves as the wise decision-maker, guiding email receivers on how to handle authenticated emails based on SPF and DKIM verification results.

  • AWS SES Integration: AWS SES, as a reliable and scalable cloud-based email sending service, seamlessly integrates these authentication mechanisms. Its features, such as scalability, deliverability, and compatibility with industry standards, make it a suitable choice for organizations looking to enhance their email security posture. Real-time analytics and monitoring tools, combined with cost-effectiveness, ease of integration, and a developer-friendly interface, contribute to a comprehensive solution for secure email delivery.

In adopting SPF, DKIM, and DMARC with AWS SES, organizations not only safeguard their digital identities but also bolster user trust by ensuring that emails are genuine, secure, and delivered with integrity. This strategic combination lays the foundation for a secure communication channel, fostering a resilient defense against the evolving landscape of email-based threats. As the digital realm continues to advance, this approach stands as a testament to proactive and effective email security practices.

The best part is you don’t get the “Be careful with this message” warning anymore 🙂.

AWS Simple Email Service Security was originally published by Capgemini at Capgemini Software Engineering on July 12, 2024.

https://capgemini.github.io/cloud%20native/ses-email-security
Automation Scripting with Bash
DevelopmentScriptingAutomationBash
Show full content

The need for automation in software engineering, especially in Cloud and Platform Roles is apparent. Manual, repetitive tasks consume time, introduce the possibility of human error and can become tedious for team members. In this blog post, I wish to document my journey of automation using Bash. We’ll look at the process and explore how automation can positively impact our workflows and our team.

Automating Simple Commands with Bash Scripts

For my first project at Capgemini, I have been working as a Platform Engineer within Cloud & Custom Applications. I quickly recognised the importance of automation in streamlining workflows. I took on the challenge of automating repetitive tasks, in the hopes of saving time and reducing human error.

Within my role, I find myself frequently using the same set of commands. For example, I often type vault read --field=value <path> | openssl x509 -noout -text, a command commonly used for reading and inspecting certificates stored in HashiCorp Vault. The symbol | is called a “pipe” in Bash scripting. It allows the output from the command on its left side to be used as input for the command on its right side. For example, in the command above the output of vault read --field=value <path> is passed to openssl x509 -noout -text for further processing.

This got me thinking: why not create a basic bash script that takes the vault path as an argument, which I could then alias as a command. That way, I could simply run <alias> <path> to fetch the certificate. Not only would this save me time, but it would be easier on the fingers.

So I set about doing this as below:

  1. The first step in creating a script is to create a file ending in .sh, here we will create a file called vaultcert.sh.

  2. Next, open the file in a text editor and define its functionality: The script accepts an argument for the vault path. This argument is passed as the path to the command vault read --field=value <path> | openssl x509 -noout -text.

#!/bin/bash

# Function to read certificate from Vault
read_certificate() {
    # Check if argument is provided
    if [ -z "$1" ]; then
        echo "Error: Vault path argument is missing."
        exit 1
    fi

    # Execute the command to read certificate from Vault
    vault read --field=value "$1" | openssl x509 -noout -text
}

# Invoke the function with the provided argument
read_certificate "$1"

3. We must set executable permissions to ensure the script is able to execute. We can do this by running in Linux:

chmod +x vaultcert.sh

4. An alias allows the script to be executed from a single command. We can do this by adding the following to our shell configuration file (e.g. .bashrc or .bash_profile):

alias vaultcert='/path/to/vaultcert.sh'

5. Now we must reload the shell configuration to apply the changes. We can do this by sourcing: source ~/.bashrc or source ~/.bash_profile.

6. Finally, confirm that the alias works by running the following command in the terminal:

vaultcert <path>

And that’s it you have your first automated bash script!

Automating More Complex Commands with Bash Scripts

Once I had my first bash script up and running, I challenged myself to automate more complex tasks. With Bash scripting, we can make tasks such as managing files much easier. The possibilities of streamlining workflows with Bash scripting and automating complex tedious tasks are endless.

For example, the next automation script I worked on expanded upon the first one, but allowed the user to copy the file stored in vault to their local machine and then write to a further vault path. All we have to do is follow the steps above and create a new script:

#!/bin/bash

copy_write_certificate() {
    # Check if arguments are provided
    if [ -z "$1" ] || [ -z "$2" ]; then
        echo "Error: Arguments are missing."
        exit 1
    fi

    # Execute command to copy certificate from Vault to local and write to new destination
    vault read --field=value "$1" > /path/to/local/file
    # Check if the file was copied successfully
    if [ -f "/path/to/local/file" ]; then
        echo "Certificate copied successfully to /path/to/local/file"
    else
        echo "Error: Certificate copy failed."
        exit 1
    fi

    # Prompt user to confirm writing the file to the new destination vault path
    read -p "Do you want to proceed with writing the file to $2 (y/n): " choice
    if [ "$choice" = "y" ] || [ "$choice" = "Y" ]; then
        vault write --field=value "$2" value=@/path/to/local/file
        echo "File successfully written to the new vault path."
    else
        echo "Cancelled writing to vault."
        exit 1
    fi
}

# Invoke the function with the provided arguments
copy_write_certificate "$1" "$2"

As demonstrated, Bash scripting opens up endless possibilities. We can increase the efficiency of our team by automating frequently used commands and sharing this with our team. Adding docs and sharing demos helps facilitate collaboration and boost productivity across the entire team.

We have explored my journey of bash automation on my first project here at Capgemini. In my next blog post, we will look at taking this one step further by introducing Python scripting, also discussing the use cases for Bash vs Python.

Thanks for reading!

Automation Scripting with Bash was originally published by Capgemini at Capgemini Software Engineering on May 24, 2024.

https://capgemini.github.io/development/automation-scripting-bash
DCX React Library v1.0.0 is with us and we’re proud of it
EngineeringReactLibrary
Show full content

In 2022 we announced a new style-agnostic React library called dcx-react-library and today, after almost 3 years of hard work and 14 releases, we happy to share that version 1.0.0 is finally available.

What’s new

We’ve built more than 35 components organised in 3 different categories: forms, utilities and typography, keeping in mind our main principles:

  • fewer dependencies
  • small bundles
  • fully tested
  • AA accessible

I’m pretty sure the main question you’ll be asking yourself is: “why did you decide to make a major release version?” After using the library on our projects, we realised that we sped up the development process as we promised but being style agnostic by nature our development team was spending too much time making our components look pixel perfect and exactly how our UX/UI team designed. We asked ourselves: “now that we have got the foundations, can we do better?” Yes we can! So we decided to implement a design system for most of our components. Most of our components have been redesigned without introducing any breaking change adding the usage of tokens. We improved our documentation adding a new section called design system: Documentation

If the use of tokens has piqued your interest and you are curious to understand how it works, we have created a quick video that illustrates how to start using it:

What’s next?

Now that version 1.0.0 is finally available we will focus on implementing the design system for our remaining components, we’ll continue to implement new component (if your project needs components that we don’t have yet, please don’t be shy, create a request on our github repository and stay tuned… something else is cooking in the Capgemini labs.

You want to know more?
  • If you’d like to know more about the library, feel free to contact Daniele Zurico.
  • If you curious to give it a try, it’s publicly available on npm.
  • If you’d like to contribute, you will need to request access from our public repository.
  • If you want to familiarise with all the components we built, take a look at our storybook documentation.

DCX React Library v1.0.0 is with us and we’re proud of it was originally published by Capgemini at Capgemini Software Engineering on May 24, 2024.

https://capgemini.github.io/development/dcx-react-library-v1-is-here
Devoxx 2024
LearningCultureArchitectureAI
Show full content

Do queues in the ladies’ toilets signify success at an IT conference? Is a half-hour sit-down worth more than a T shirt? These and other deep questions pondered.

Capgemini’s Cloud Development team are returning from our annual pilgrimage to Devoxx UK, the best and biggest annual IT developer conference in the UK. As is becoming usual, a train strike limited my attendance on the Wednesday, but even in a couple of days there has been much to inspire and mull over. This year, instead of a full booth, Capgemini sponsored a corner filled with super-comfy bean bags, and it was the most popular our area has ever been! As always it felt like a very inclusive conference, with a wide range of age groups and nationalities, and a good balance of genders and races present.

Dreaming their own future

AI Again

As with last year there was a strong theme of AI across many of the conference talks, but this year the emphasis was more on developer involvement - less “Wow look what this can do”, and more “This is how you can build things”. Some angles were difficult to see a production use-case for - for example, Developing Intelligent Apps with Semantic Kernel showcased how with the Semantic Kernel Java API you can allow generative AI access to call your Java functions whenever it considers them appropriate. Having recently taken Capgemini’s “Responsible AI” training course and appreciating that generative AI’s decision making process cannot be considered “correct”, and so you could only use this scenario when it didn’t matter if the AI got the decision to make the Java call wrong, this really limits usage. Deciding when to delete development resources, perhaps? Or rotating a “thought for the day” message? Nothing more important, please! The speaker’s (inevitable) example of roleplaying games seemed safe enough. More AI concerns in Harel Avissar’s talk “The State of Malicious ML Models Attacks”. There are now popular repositories such as HuggingFace where people can share their pre-trained large language models (LLMs), which once trained are, as Harel pointed out, basically deployable units the same as any other software artifact. As such, they are open to attack in the same way as other pipeline artifacts and need to be signed and secured in the same way. This is quite a new field for attacks, but when scanning the LLMs available to download they did find some examples where LLM pickle (.PKL) files, which can run Python scripts when you open them, contained malicious “reverse shell attack” code. Something to be aware of.

AI and Vectors

Quite a few talks focussed on looking at vector databases, commonly used in LLMs, and the algorithms needed to search them. Vector types can represent the relationships between concepts across multiple dimensions. Mary Grygleski’s talk on Friday covered how ChatGPT’s vectors represent 1,536 dimensions. This would mean, for example, it could store 1,536 different contexts for the word “cat”. Handy. But not brand new, and not specific to generative AI - Elasticsearch for example has been using a vector database and the HNSW search algorithm to great effect in its fuzzy searches for years. In fact, as I discovered in my last blog post, the powers of fuzzy search and natural language processing are often more real-world useful than generative AI. You don’t usually want a computer system to come up with its own answer to a question, you want it to go away and find a definitive answer!

Thoughtful Architecture

Conference regular Andrew Harmel-Law was back with what some has said is his best talk yet - “How we Decide”. I didn’t attend as I had recently seen Dave Snowden’s talk on “being human in an age of technology”, in which he went into detail on the biological human process of decision-making - for example, did you know that groups of 5 or less will always reach consensus? Possibly because this is a common family group size, allowing ancient humans to move quickly in family units. And did you know that only 4 in 5 of your decisions are made by the conscious brain, with the other one coming pretty much straight from the body’s senses? Things like pheromones have a big impact on decision-making when face to face. I was still processing this information so skipped Andrew’s talk for a deep dive into container-based IDEs, but I feel I missed out and am looking forward to the video! I did attend Barry O’Reilly’s joyous “An Introduction to Residuality Theory”. Any talk that begins with a Douglas Adams quote and goes on to arm the attending developers with some REALLY DIFFICULT questions to put to enterprise architects (especially difficult if you don’t allow them to use the word “magic”) is always going to go down well, and O’Reilly went on to confirm my suspicions that the way we approach software architecture is less science and more art. He pointed out that our architecture diagrams don’t capture time, change or uncertainty - hugely important factors in software engineering - and how good software architects must be really comfortable with the concept of uncertainty. He had lots of great quotes about how Agile is a reaction to the realisation that requirements don’t work for complex systems, but that using Agile and reacting to change when problems arise leads to flaky architectures. His revelation, and the topic of his Ph.D thesis, is that software architecture could be a science, if we use methods from complexity theory. He went on to outline some really excellent ideas about how to robustify your candidate architecture using random simulation and ideas based on Kauffman’s networks and attractors. Plus, what to do if fire-breathing lizards should happen to climb out of the Thames. Forewarned is forearmed!

More microservices

Microservices haven’t gone away from Devoxx, and this year there were a couple of talks debating / re-debating the monolith v microservice argument. I especially enjoyed Chris Simon’s Modular Monoliths & Microservices - A 4+1 View Continuum in which he advised us all not to bother with the debate as both terms were becoming meaningless! Even a monolith is probably a distributed system. And if you view your client browsers as a scaled-out cluster of your application front end, which you probably should, even the simplest static website becomes a vast distributed network. He recommended moving away from the terms microservice / monolith and instead directly using terms such as “process” (a thing you can start and stop), and “node” for distributed compute, and teams and repositories for code ownership. He talked about the power of Domain Driven Design (DDD) when building up the logical view of your system, and how mapping processes onto bounded contexts can give you a candidate physical architecture, although don’t shy away from multiple processes in a bounded context if, for example, there are asynchronous steps between multiple processes (i.e. a queue). He also gave the quote of the conference when he said people often found they had a service and “distributed the wazoo out of it and we don’t know why”. He talked about the development view of an architecture and how your code repository should always align with the logical view. If this doesn’t happen you end up with inter team coupling - behaviours such as “Scrum of Scrums” mean that it is becoming very hard to get decisions made and is a clue that your development view isn’t in alignment. He also talked about how we should reconsider our assumptions on the physical view of the architecture - we often jump to the conclusion that a microservice needs its own infrastructure but this may not be the case, and we often assume that a repository must be deployed as a monolith when in fact you can use path filters on your monorepo to deploy separate sections at different times.

Fear of Rust

For something new, I went to see Ayoub Alouane talk about “My discovery of Rust: Why is it a Game Changer?” where he showed an example of Discord’s “Message Marker” app showing regular spikes in CPU that corresponded with garbage collection. The corresponding Rust application had no such spikes. Why? Rust, announces Ayoub, does its memory management at compile time! He went on to demonstrate with a simple loop application how, for each variable, you had to specify the memory details of the variable:

  • Is it mutable?
  • Is it usable in multiple threads?
  • Is it lockable?
  • Should you clone it into threads?

All this can be expressed through the syntax of the language. A bit of a pain to learn, but well worth it. A lot of end-of-life C applications are being migrated to Rust, and after this talk I can understand why. He gave another example of rewriting a node.js app in Rust and using 95% less memory and 75% less CPU! OK, game-changing point taken! This talk of application efficiency fit well with our own Shirley Tarboton’s talk on Decarbonising Code. She gave some examples of easy-to-implement coding practices that can make your application consume fewer resources. It was a well-received talk in the huge auditorium theatre - quite a feat to get people to attend during lunch!

Decarbonising code with Shirley

Crashing and burning

Last thing Thursday I popped into a talk called Mayday Mark 2! More Software Lessons From Aviation Disasters.. As speaker Adele Carpenter pointed out, the Venn diagram of software engineers and aviation geeks is, “basically a circle”. Previous aviation disaster talks I’ve been to focus on ways of working we can both benefit from - for example, focussing on psychological safety, failing fast and failing openly. Adele’s talk, however, was a rather harrowing blow-by-blow account of some major air crashes from the past few decades. There were some important lessons about how much information humans can cope with, how humans react unexpectedly in stressful situations, the importance of familiarity and expectations in UI design, but overall it was a rather sobering affair. Which is just as well, since we all poured straight out of the talk and into the local pub for the famous Devoxx party, complete with IT-branded beers. Cheers for another year!

Continuous Change

On Friday I went to a talk very relevant to my current project - Chris Simon’s “Winning at Continuous Deployment with the Expand/Contract Pattern”. My project has a fairly typical Continuous Integration pipeline, but we still do production releases via manual intervention on a business schedule. One of the reasons is schema changes which cause backward incompatibility between services - this can cause outages if our server-side (message recipient side) services are updated before the client side (message sender) services - because even if they manage to process the old-style message, they may send a response the new client isn’t expecting. So whilst we have eventual consistency, a few requests may get lost along the way during the service restart. The Expand/Contract pattern addresses this by putting the onus on developers to create an “interim” server-side application which can accept both old and new client messages, and sends both old and new response formats. So if you used to have an application that took a username and returned OK like this

{
 "name": "Sarah Saunders"
}

{
 "response": "OK"
}

and the new version would take first name / last name and return OK and an ID:

{ "name": {
 "first": "Sarah",
 "last": "Saunders"
}}

{
 "id": "2134-ker-438052u"
}

In this scenario, the “interim” application would accept both inputs, and would return an output like this:


{
 "response": "OK",
 "id": "2134-ker-438052u"
}

This should allow client applications to talk to the server both before and after they are migrated to the new version. Then, once all clients are migrated, the “Contract” stage of the release occurs and the server can be moved to only accept/respond in a V2 style.

I really like this pattern and I don’t think developers would mind writing the extra code for the interim release. It does spell out though, that deployment must be a consideration right from the design stage of an application, and Chris did raise this point at the start - mentioning his other talk goes into more detail about how Test Driven Development (TDD) helps this by defining strong contractual interfaces between components, and is a strong enabler for continuous deployment. I have long been a strong advocate of TDD, and here is yet another reason why getting ChatGPT to write your tests for you is no substitute at all for proper TDD!

In conclusion, a really great conference for affirming that we (Capgemini Cloud Development team) are on the right lines with the software and architectures we are currently building for our clients, our use and understanding of AI aligns well with the global community (or is sometimes a little bit ahead) and we have lots of great lessons to learn to improve our architecture and deployment practices even further. Looking forward to the next year.

Devoxx 2024 was originally published by Capgemini at Capgemini Software Engineering on May 17, 2024.

https://capgemini.github.io/architecture/devoxx-2024
Exploring MacOS Development: Creating a Menu Bar App with Swift & SwiftUI
DevelopmentMacOSSwiftSwiftUI
Show full content

Hijri Menu Bar App Cover Image
For a first foray into MacOS development, I built a custom menu bar application. The Hijri Menu Bar allows the user to view the current Hijri date. It is convenient, minimal and easy to use. Below, we’ll highlight the development process, looking at the code, build process and challenges faced.

Development

The development process began with setting up the project in Xcode. Swift and SwiftUI were chosen as the foundations for the app. Essential components such as the AppDelegate, which manages the app’s startup and background tasks, were incorporated, alongside a custom StatusItemView to handle the UI and what the user would see.

App Delegate

NSStatusBar Swift Code Snippet
The AppDelegate class is responsible for setting up the status bar item, managing the popover and handling user interactions. It initialises a clickable item in the menu bar with a calendar icon.

NS Popover Swift Code Snippet
An NSPopover is essentially a holder to display the contentView (a SwiftUI view). This popover will utilise a custom SwiftUI View to display the current Hijri date information.

Now we have a clickable item in the menu bar and a popover. We need to ensure when the item is clicked, the popover will appear on the screen.

Button Swift Code Snippet
To achieve this, the Button action and target lines establish the target-action pattern. Clicking the button will now trigger the togglePopover(_:) function (defined below).
This function decides whether to show or hide the popover, based upon it’s current state.

Toggle Popover Function Code Snippet
The togglePopover(_:) is a custom function. It first checks for the validity of both the button and popover. If the popover is already shown, it will close it.
If the popover is hidden, it fires a custom notification .fetchHijriDates. This will be received later in our StatusItemView (The SwiftUI view responsible for displaying the dates).

This function also displays the popover relative to the menu bar icon’s bottom edge.

NSApp Activation Policy Code Snippet
Applications in MacOS can have different activation policies that determine how they behave in the user interface. Setting the activation policy to accessory mode means that the application will remain active in the background, even if the user closes its window or quits the application. This is useful for menu bar applications that need to remain running without requiring constant user interaction.

User Interface

After configuring the AppDelegate, the focus shifted to building the user interface. A new file is created called the StatusItemView. This will be responsible for displaying the current Hijri date in the popover created above.

Status Item View Code Snippet
This view manages the date fetching and presenting that to the user. The app starts with state variables which will store the current Hijri dates in Arabic and English.

Text View Code Snippet
Next, we utilise Text views to display the information stored in the state variables (i.e. the current Hijri date in Arabic and English with appropriate formatting.)

HStack Code Snippet
SwiftUI offers a horizontal stack HStack which is used here to display a copyright message and an information icon. Tapping the information icon toggles the visibility of an instructions menu (which is coded in a separate InstructionsView).

VStack Code Snippet
The app fetches the latest Hijri date in two ways: When the view first appears on the screen, it grabs the latest date for an accurate display. This is achieved using the onAppear method. However, this did not seem to refresh the dates subsequently, say every midnight, hindering the functionality of the app.

We can achieve this through utilising Swift’s NotificationCenter class. Here, a custom notification fetchHijriDates is triggered from the AppDelegate every time the popup is shown. The StatusItemView listens for this notification. When it is received, it fires the custom fetchHijriDates function (explained next) which will handle fetching the current date.

Fetch Dates Function Code Snippet
Behind the scenes, a custom fetchHijriDates function is responsible for fetching the current Hijri dates. It formats them according to Arabic and English locales and updates the state variables created earlier with those formatted dates. It is called both when the view appears and in response to the fetchHijriDates notification, discussed earlier.

Here we are able to leverage Apple’s built-in Date and Calendar functionality, using Calendar(identifier: .islamicUmmAlQura). This (satisfying) pre-built functionality saves us time and ensures accuracy and uniformity in our date handling.

This is the end of our discussion on the code. The full code can be found on the GitHub repo.
Below we will look at the rest of the development & build process.

Build Process & Challenges

Building the app in Xcode involved compiling the code, resolving errors and creating a distributable package. This process also involves setting the minimum macOS version, which is set to 12 for this app.

App Icon

App Icon
Creating a visually appealing icon was important. A custom icon was designed and added to Xcode in various sizes to ensure it adapts to different screen resolutions. I used this free online tool to achieve this.

Distribution

Distributing through the Mac App Store, requires a paid Apple Developer ID, so I chose to make it freely available on GitHub. Subsequent releases were uploaded via GitHub Releases.

Contemplating Further Features

Features like automatic launch at login were initially considered. Launch at login ensures the menu bar icon is always visible (i.e. that the app is started after the user logs in/restarts their machine). However the focus remained on core functionality and a simple user experience. Instructions are provided to users during installation to manually set launch at login.

Creating DMG File

To ensure smooth installation, I created a DMG file, using the create-dmg tool. Clear and concise instructions were included to guide users through the installation process. Using the downloaded DMG file, users can drag the app to their Applications folder, and launch it from there.

Conclusion

This journey, from concept to distribution, has been a valuable learning experience in macOS development. I hope to continuously improve the Hijri Menu Bar app and provide users with a seamless way to view Hijri dates on their Macs.

If you’re interested in trying out the app or contributing to its development, you can find it on GitHub.

Exploring MacOS Development: Creating a Menu Bar App with Swift & SwiftUI</a> was originally published by Capgemini at <a href="https://capgemini.github.io">Capgemini Software Engineering</a> on April 11, 2024.</p>

https://capgemini.github.io/development/macos-development-with-swift
How to (maybe) replace your HR department in 3 easy steps
AIChatbotCloudAzure
Show full content

I recently had the privilege of judging an internal Capgemini hackathon. It was an open brief, but the focus was to be on technology and its application to solve a real-world business problem. The entries were varied and excellent, from a dashboard to assess how warm/busy/accessible the office was so you could decide whether or not it was worth going in, to gamification of training, to improvements for mountain search and rescue teams. One of the major commonalities across many of the entries was the use of “AI”, where, given our common use of the Azure platform, AI tended to be defined as Azure cognitive search (recently renamed Azure AI) indexing a set of business documents, with a natural language processing unit on top to act as a chatbot. This made me want to have a go myself, and see what I could build!

Is it AI?

This architecture, for me, isn’t really using the “AI” bits of AI - despite the fact that if you use ChatGPT (which is available as a product in Azure since Microsoft’s purchase of OpenAI) there can be some non-deterministic, generative functionality, but it sure is useful and could probably ease the burden of the HR and support departments of many organisations - and could possibly even replace a lot of the staff in these departments. I set out to see if I could build a HR chatbot to replace the kinds of queries a typical HR department employee might need to deal with. Why HR? Just because everybody hates them?? No…! - it’s because of the remit of HR, dealing with the employee lifecycle and needing to prove that a company acts without bias it must be a heavily process-driven department. These processes must be documented, and most of the workload of the department is in dealing with queries regarding the process. The incoming questions are probably not phrased in the same way as the process documentation, so some sort of fuzzy search is required in order to automate the question-answering process; for example, translating “how much paid time off do I get when my baby is born?” to “paternity leave allowances” is not a straightforward mapping. This is the reason that previous attempts to automate such departments have failed. Language is too complex for simple mappings and decision trees to replace a person on the end of a line - as anyone who has tried to navigate an automated telephone call will tell you. Who hasn’t ended up shouting “I WANT TO SPEAK TO A PERSON” down the line? But at the end of the day, the workload is simply regurgitating content from a document repository and the hard bit comes in finding the relevant sections - a process that is better automated as it’s a pretty unrewarding job acting as a knowledge base for people who can’t be bothered to read swathes of documentation.

The Architecture

As mentioned, the hackathon had been playing with the Azure cloud, so we’ll keep to this and use Azure’s concepts. These are pretty simply translated to any hyperscaler though, or to open-source alternatives if you want to host your own. For example, on AWS you could use Textract and Amazon Comprehend, and in the OSS world you’d perhaps use NLTK and Lucene.

Azure AI Search is a nice tool - a little more than document search, a little less than AI. It can be a bit clunky to get used to, and the price policy is per GB storage which is pretty bizarre - but this can be beneficial if you have query-intensive applications and a small-ish data set of documents. We use it as an exotic database view for one of our applications, and it took us a while to get used to the fuzzy query syntax - it’s not really designed for logical queries, it’s much better at giving you best-guess matches for loose search terms - and as such is well positioned to be the back-end of our HR chatbot.

The Method

I found a couple of tutorials and quick-starts to create chatbots on my documentation - Azure Search OpenAI demo or Query your documentation using Langchain

The issue I found is that it’s all moving quite fast - faster than the tutorials can keep up with. All mention of Langchain has now gone from the Azure portal (although you can still write your own Langchain chatbot), and QnA maker has now moved on and we have Azure AI Language Studio where you can add in your documents via a “Custom Question Answering” project, which is a type of Azure “Language” and can be created via the LoCode/NoCode Language Studio homepage. The tutorial speedily guides you through a simply-configured web form although it’s not quite clear what you are actually going to create - looking at what was deployed after the configuration steps, this sets up an Azure cognitive search (AI search) repository and then enables custom text classification / custom named entity recognition on the repository. The default behaviour for this appears to be breaking down the content in your referenced documents into paragraphs and pulling out likely titles/subjects. You can then modify this classification by adding in new questions and answers, or choosing the best answer for given terms.

The free trial only allows you to upload three sources into your AI search repository. So, for our HR example, I’ve downloaded three HR policy documents from this handy online repository and added them into my Custom Question Answering repository. This generates a “Knowledge Base” that I can then publish.

Upload documents into your language knowledge base

Here we can see the way that the content has been divided up into major terms and paragraphs that may address those terms. I can edit here, and once it’s published I can generate a Bot to act as the user interface to it.

knowledge base parsed from documents

OK so now onto creating this Bot. As Bots go, OpenAI’s ChatGPT is the real deal. Generative AI, pre-trained to recognise vast arrays of English language. For most use cases we would have to “turn off” all the fun, generative stuff for our application (see Guardrails below) and it’s probably overkill to use ChatGPT for this demo - plus, it isn’t included in the Azure Free Trial tier so I will be experimenting with the Azure AI Bot Service instead. It should be sufficient for this fairly small and simple demo. Cost-wise, the Azure AI Bot has a free tier, but it must be hosted via an Azure Web App whose service plan is defaulted to S1 (Standard). This plan, at £75/month to keep it running, is eating rapidly into my free credit!

Configuring the Bot online is pretty straightforward. The web GUI provides you with a customised template for creating the resources that you will need, creating an App Service Plan to launch an Azure WebApp that will host your Bot. The only config you have to do is enter the key of your Language Resource so that you can create a secure connection between the AI service knowledge base and the chatbot. This isn’t documented, but you can find the key by going back to the Azure Portal home and clicking the green squares “All Resources” view, then selecting your Language resource (the resource where Type = Language) and then selecting the Keys and Endpoint menu item. (There are two keys, so that you can refresh them by rotating them individually and hence avoid downtime. Either one is fine.)

Testing

Once your Bot is deployed, you can test it by finding it under All Resources and choosing “Test in Web Chat” from the right hand menu.

I tried with a simple question, that I know is answerable with the content in the documents: Trial question

So far so good. The Bot has successfully found the right bit of my documentation and returned a comprehensive and understandable answer. How about another:

Second question

Oh dear. “cannot” is not exactly a strong English sentence! Although it has found the relevant section of the documentation, it has not been able to pull out a contextual answer. I am not sure if it is the Language Service or the Bot which is struggling with this question. Enabling and examining the logs on the Bot Service isn’t that helpful - it just shows HTTP POST requests going to the Bot framework. The Bot framework should be responsible for breaking down the user’s entered text into logical “intentions” that the back-end question-answerer can respond to, and then delivering the back-end response in a human readable form. I eventually figure out how to enable logging on my Language Service and discover the query and response that the Bot has sent to the language service:

Bot query to language service

I can see that the language service has actually done a reasonable job. It’s identified the right paragraph for the query, but returned just a 38.97% certainty rating that this is the right data. Fair enough. So it seems that the issue is with the Bot being able to pull the right piece of text out of the response. This makes me start to wonder about the “Bot” I have deployed. What is it actually based on? There isn’t much documentation I can find, but you can download the source code, which shows that I have deployed something created by the BotBuilder SDK. I should be able to run this locally, but weirdly the Bot JavaScript code in my download seems totally out-of-date with the latest Language Studio API. I have to go back to the drawing board and use one of the later samples and update the code to correctly declare a method asynchronous to get the Bot running locally using the Bot Framework Emulator.

To get it to work using Node.js v18.16.0 and restify ^11.1.0, I had to edit the sample code index.js line 91 to declare the method async or it would not start:

// Listen for incoming requests.
server.post('/api/messages', async (req, res) => {
    adapter.processActivity(req, res, async (turnContext) => {
        // Route the message to the bot's main handler.
        await bot.run(turnContext);
    });
});

I was then able to run the Bot locally connecting to my Azure-hosted Language via the Azure Bot Framework Emulator. And of course as luck would have it, the latest sample doesn’t return such a poor response! It’s still not perfect, but it’s at least a sentence. See below.

Local Bot Service running in emulator

It does also prove that the poor response here was the chatbot interpreting the data from the Language Service. The Language Service will return a field called an AnswerSpan which lists, with a confidence score, the section of the documentation it considers most relevant to the question. In the case of my “dismissal” question, the AnswerSpan returned was:

An employee whose appointment is terminated for any reason will be provided with a written statement of the reasons for the dismissal

This text was paired with a confidence score of 0.2880999999999997, or circa 29%. Fair enough. So how the cloud-deployed bot extracted the answer “cannot” from this is a bit of a mystery! The new version of my Bot prints the whole AnswerSpan and is, whilst still not exactly accurate, at least better. So how do I fix it?

Customisation

It seems the way to fix up these simple Bots is to go and add a custom question/answer into the Language Service knowledge base. I try adding a specific answer to the question, “Can I appeal against my dismissal?”. I re-publish the knowledge base and try again.

Adding a custom question

This looks much better. But it does imply that quite a lot of user testing and customisation will have to take place before this Bot is ready to replace its human counterparts.

Guardrails

One of the things that surprised people about ChatGPT, particularly in its earlier iterations, was that it was not trained to be accurate. It was trained to please the user. This would mean it would return inaccurate answers above telling you that it didn’t know the answer, as it had gauged higher satisfaction from “lying”! You don’t want your HR chatbot to lie, so you must use the guardrail settings to ensure that it does not. With ChatGPT, guardrails can be set using natural language, for example you can state:

{"role": "system", "content": "Assistant is an intelligent chatbot designed to help users answer their tax related questions.`

Instructions:
- Only answer questions related to taxes.

- If you're unsure of an answer, you can say "I don't know" or "I'm not sure" and recommend users go to the IRS website for more information. "},
{"role": "user", "content": "When are my taxes due?”}

This configuration will prevent the chatbot from “making up” an answer if it cannot find a decent response in its repository. Configuring Azure’s ChatGPT chatbot via the GUI, to achieve the above you turn the setting known as “temperature” down to 0. The temperature represents how creative the chatbot can be in getting you an answer. A low temperature results in more “I’m sorry I don’t know” type answers, but increases the chances you’ll get an accurate answer, and that you’ll get the same answer when you ask the same question twice!

The Cost

So what does this cost to run in Azure? Depending on your Bot type, the cost can vary wildly. As mentioned, I am running my Language instance and my Bot instance in the free trial tier, so I am only paying for the app service to host them and this is around £75/month. If you were to use an enterprise ChatGPT Bot, costs are over £800/month fixed rate for 40 users, plus 80p per “usage unit” and £20 for any extra users over and above the plan. Still considerably cheaper than making your HR staff deal with these queries, I suppose.. As mentioned, Azure AI search is priced per GB of data indexed, the free tier runs up to 50 GB, Standard tier gives you 25 GB for 27p/hour.

In Conclusion

I am impressed with the Azure AI search offering - it’s powerful and useful - there are so many scenarios whereby we end up awash with documentation and cannot find the content we need. The chatbots are a varied bunch but I liked the way you could download the code and run/edit it locally with relative ease. In all, I feel this will be a very common architecture for the business problems of the next year or so.

How to (maybe) replace your HR department in 3 easy steps was originally published by Capgemini at Capgemini Software Engineering on January 19, 2024.

https://capgemini.github.io/cloud/create-ai-bot-in-azure