Show full content
2025.09.13 : I’m Kevin Hubbard, Electrical Engineer. I’ve spent my 30+ year career designing embedded systems using ASICs, FPGAs, and embedded CPUs. It’s been an amazing journey that I hope others will pursue. I’m giving back now in writing this “BML Designing RISC-V SoCs with FPGAs” series which starts here.
The previous chapter enhanced the simple three instruction Femto CPU by adding two more RISC-V instructions for reading (“lw” or “Load Word”) and writing to memory (“sw” or “Store Word”). A short four DWORD assembly language program then looped, reading the value of switches mapped to RISC-V high memory space ( 0x10000004 ) and writing those values to LEDs mapped to RISC-V high memory space ( 0x10000008 ).
This chapter will move past assembly language and demonstrate using the C programming language to compile a simple “Blinky” program to flash the LEDs. The Verilog files from the previous chapter will be re-used and enhanced for this.
Step one is to install a cross-compiler. What’s a cross-compiler? It allows an engineer to write software on one computer and target a completely different CPU architecture. Oftentimes this means running GCC on an 80×86 Linux workstation and targeting an embedded CPU like the RISC-V. Someday we may all have RISC-V Linux workstations on our desks, but not today.
Installing GCC for RISC-V on my Ubuntu 22.04 LTS Linux workstation was super simple.
%sudo apt install gcc-riscv64-unknown-elf binutils-riscv64-unknown-elf
Compiling C code to a binary file takes a couple of steps:
Step-1 : Build a linker load file. The RISC-V CPU has a 32bit memory space ( ignoring the 64bit RISC-V for the moment ). The Compiler needs to know that portions of that memory space are populated with actual memory. An embedded system, as an example, might have Read-Only Flash memory for machine code instructions and also SRAM for storing variables, and large memory allocations ( mallocs() in C parlance ).
[ link.ld ]
MEMORY {
FLASH (rx) : ORIGIN = 0x10000000, LENGTH = 256K
RAM (rwx): ORIGIN = 0x20000000, LENGTH = 64K
}
SECTIONS {
.text : {
*(.text)
} > FLASH
.data : {
*(.data)
} > RAM
.bss : {
*(.bss)
} > RAM
}
The “.text” section refers to executable code. Confusing, right? section gets its name from historical Unix and compiler conventions, where “text” referred to executable code – not readable ASCII characters as we might assume today.
The “.data” section refers to initialized data – meaning variables.
The “.bss” section refers to uninitialized data – meaning variables with no startup default values.
The “.heap” section is for memory allocations – aka mallocs().
The “.stack” – is well, the Stack. The Stack is a linear data structure that follows the Last In, First Out (LIFO) principle—meaning the last item added is the first one removed. It’s used for local variables and return addresses for function calls.
A final section ( not shown ) is “.rodata” – or Read-Only data. Constants – or variables which can not be changed.
For this Femto-CPU design, the linker load file is fairly simple, a very small RAM.
[ link.ld ]
ENTRY(_start)
MEMORY {
RAM (rwx) : ORIGIN = 0x00000000, LENGTH = 64 – 0
}
SECTIONS {
.stack (NOLOAD) :
{
_stack_end = ORIGIN(RAM) + LENGTH(RAM);
_stack_start = _stack_end – 0x0010; /* 16 Byte */
} > RAM
. = ORIGIN(RAM); /* Reset counter to start of RAM */
.text : { *(.text.init) *(.text) } > RAM
.rodata : { *(.rodata) } > RAM
.data : { *(.data) } > RAM
.bss (NOLOAD) : { _bss_start = .; *(.bss) _bss_end = .; } > RAM
.heap (NOLOAD) : { _heap_start = .; } > RAM
PROVIDE(_sp = _stack_end);
}
The C code is quite simple. Count an unsigned integer named cnt in a forever loop and write the value to the LED peripheral at 0x10000008 using the C pointer led_ptr.
[ main.c ]
typedef unsigned int uint32_t;// stdint.h not available
volatile uint32_t* led_ptr = (uint32_t*)0x10000008;
int main()
{
uint32_t cnt = 0;
while (1)
{
cnt++;
*led_ptr = cnt;
}
return 0;
}
Compiling is just a single command line. To check things, we will compile to assembly 1st. Using the -O2 optimization simplifies the design as to not require use of the stack pointer.
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 -T link.ld -S main.c -o main.s
Which reveals a definite problem. The compiles program uses RISC-V registers “a4” and “a5”, not the temporary registers “t0” and “t1” that the Femto-CPU implement.
.file "main.c"
.option nopic
.attribute arch, "rv32i2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.section .text.startup,"ax",@progbits
.align 2
.globl main
.type main, @function
main:
lui a5,%hi(led_ptr)
lw a4,%lo(led_ptr)(a5)
li a5,0
.L2:
addi a5,a5,1
sw a5,0(a4)
j .L2
.size main, .-main
.globl led_ptr
.section .sdata,"aw"
.align 2
.type led_ptr, @object
.size led_ptr, 4
led_ptr:
.word 0x10000008
.ident "GCC: () 10.2.0"
No need to panic though. This just leads to the next lab assignment. The assignment is to enhance the femto_cpu.v Verilog file to add registers “a4” (x14) and “a5” (x15) to the existing instructions. A warning that x14 and x15 does not mean 0x14 and 0x15 hexadecimal, but rather 14 and 15 decimal.
Compiling the C code to a ROM *.bin file takes a couple of steps.
[ go.sh ]
# Compile *.C to *.elf
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -O2 -nostartfiles -nostdlib -ffreestanding -T link.ld -o main.elf main.c
# Convert ELF to Raw Binary (Optional)
riscv64-unknown-elf-objcopy -O binary main.elf main.bin
Once that first task is completed, update the femto_wrom.v to contain the compiled C program. There are two choices to get the five DWORDs. Option-1 is to copy and paste the assembly code from main.s into the online assembler.

After compiling the assembly code, click on the “Disassembly” tab to see the six DWORDs of machine code in RAM starting at address 0x00000000.

Option-2 is to hexdump the main.bin file that GCC generated. Unfortunately hexdump utility dumps 16 bit WORDs instead of 32 bit DWORDs.
0000000 2703 0140 0793 0000 8793 0017 2023 00f7
0000010 f06f ff9f 0008 1000
By passing a custom format string, it is possible to get just a dump of DWORDs in proper order.
%hexdump -e '8/4 "%08x "' -e '"\n"' main.bin
01402703 00000793 00178793 00f72023 ff9ff06f 10000008
The final part of the lab assignment is to simulate femto_core.v with the modified femto_cpu.v and femto_wrom.v files include.
%vsim femto_core
The force file doesn’t do much other than provide a clock and toggle reset.
[ force_femto_core.do ]
force clk 0 5 ns, 1 10 ns -repeat 10 ns
force reset 1; run 20 ns;
force reset 0; run 300 ns;
The wave file lists the memory interface and the internal CPU registers.
[ wave_femto_core.do ]
add wave /femto_core/clk
add wave /femto_core/reset
add wave -radix hex /femto_core/led
add wave -divider {Memory Bus}
add wave -radix unsigned /femto_core/u_femto_cpu/bus_addr
add wave -radix hex /femto_core/u_femto_cpu/bus_wr_en
add wave -radix hex /femto_core/u_femto_cpu/bus_wr_d
add wave -radix hex /femto_core/u_femto_cpu/bus_rd_d
add wave -divider {Memory Cells}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[0]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[1]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[2]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[3]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[4]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[5]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[6]}
add wave -radix hex {/femto_core/u_femto_wrom/wrom_array[7]}
add wave -divider {CPU Registers}
add wave -radix unsigned /femto_core/u_femto_cpu/pc
add wave -radix hex /femto_core/u_femto_cpu/t0
add wave -radix hex /femto_core/u_femto_cpu/t1
add wave -radix hex /femto_core/u_femto_cpu/a4
add wave -radix hex /femto_core/u_femto_cpu/a5
add wave -radix binary /femto_core/u_femto_cpu/opcode
add wave -radix binary /femto_core/u_femto_cpu/funct3
add wave -radix hex /femto_core/u_femto_cpu/rd
add wave -radix hex /femto_core/u_femto_cpu/rs1
add wave -radix hex /femto_core/u_femto_cpu/rs2
If your modifications to femto_cpu.v and femto_wrom.v were correct, the simulation should show the LED array mapped to high address 0x10000008 incrementing by +1 every six clock cycles.

If your simulation looks good, go ahead and build an FPGA bitfile and watch the LEDs flash in real hardware. The following files will build the design targeting the Digilent Artix-7 BASYS3 board.
[ go.sh ]
vivado -mode batch -source go.tcl
[ go.tcl ]
set design_name top
set device xc7a35tcpg236-1
set_part $device
set rep_dir ./reports ; file mkdir $rep_dir
set tmp_dir ./temp ; file mkdir $tmp_dir
source top_rtl_list.tcl
read_xdc ./${design_name}_timing.xdc
synth_design -top $design_name -part $device -fsm_extraction off
report_timing_summary -file post_synth_timing_summary.rpt
read_xdc ./${design_name}_physical.xdc
opt_design
place_design
route_design
$rep_dir/post_route_timing_worst.rpt
report_timing_summary -file $rep_dir/post_route_timing_summary.rpt
report_utilization -file $rep_dir/post_route_util.rpt
report_power -file $rep_dir/post_route_pwr.rpt
set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design]
write_bitstream -force ${design_name}.bit
exit
[ top_rtl_list.tcl ]
read_verilog ../src/top.v
read_verilog ../src/femto_core.v
read_verilog ../src/femto_cpu.v
read_verilog ../src/femto_wrom.v
[ top_timing.xdc ]
create_clock -period 10.000 -name clk_100m -waveform {0.000 5.000} [get_ports clk_100m_pin]
[ top_physical.xdc ]
set_property -dict { PACKAGE_PIN W5 IOSTANDARD LVCMOS33 } [get_ports clk]
create_clock -add -name sys_clk_pin -period 10.00 -waveform {0 5} [get_ports clk]
## Switches
set_property -dict { PACKAGE_PIN V17 IOSTANDARD LVCMOS33 } [get_ports {sw[0]}]
set_property -dict { PACKAGE_PIN V16 IOSTANDARD LVCMOS33 } [get_ports {sw[1]}]
set_property -dict { PACKAGE_PIN W16 IOSTANDARD LVCMOS33 } [get_ports {sw[2]}]
set_property -dict { PACKAGE_PIN W17 IOSTANDARD LVCMOS33 } [get_ports {sw[3]}]
set_property -dict { PACKAGE_PIN W15 IOSTANDARD LVCMOS33 } [get_ports {sw[4]}]
set_property -dict { PACKAGE_PIN V15 IOSTANDARD LVCMOS33 } [get_ports {sw[5]}]
set_property -dict { PACKAGE_PIN W14 IOSTANDARD LVCMOS33 } [get_ports {sw[6]}]
set_property -dict { PACKAGE_PIN W13 IOSTANDARD LVCMOS33 } [get_ports {sw[7]}]
set_property -dict { PACKAGE_PIN V2 IOSTANDARD LVCMOS33 } [get_ports {sw[8]}]
set_property -dict { PACKAGE_PIN T3 IOSTANDARD LVCMOS33 } [get_ports {sw[9]}]
set_property -dict { PACKAGE_PIN T2 IOSTANDARD LVCMOS33 } [get_ports {sw[10]}]
set_property -dict { PACKAGE_PIN R3 IOSTANDARD LVCMOS33 } [get_ports {sw[11]}]
set_property -dict { PACKAGE_PIN W2 IOSTANDARD LVCMOS33 } [get_ports {sw[12]}]
set_property -dict { PACKAGE_PIN U1 IOSTANDARD LVCMOS33 } [get_ports {sw[13]}]
set_property -dict { PACKAGE_PIN T1 IOSTANDARD LVCMOS33 } [get_ports {sw[14]}]
set_property -dict { PACKAGE_PIN R2 IOSTANDARD LVCMOS33 } [get_ports {sw[15]}]
## LEDs
set_property -dict { PACKAGE_PIN U16 IOSTANDARD LVCMOS33 } [get_ports {led[0]}]
set_property -dict { PACKAGE_PIN E19 IOSTANDARD LVCMOS33 } [get_ports {led[1]}]
set_property -dict { PACKAGE_PIN U19 IOSTANDARD LVCMOS33 } [get_ports {led[2]}]
set_property -dict { PACKAGE_PIN V19 IOSTANDARD LVCMOS33 } [get_ports {led[3]}]
set_property -dict { PACKAGE_PIN W18 IOSTANDARD LVCMOS33 } [get_ports {led[4]}]
set_property -dict { PACKAGE_PIN U15 IOSTANDARD LVCMOS33 } [get_ports {led[5]}]
set_property -dict { PACKAGE_PIN U14 IOSTANDARD LVCMOS33 } [get_ports {led[6]}]
set_property -dict { PACKAGE_PIN V14 IOSTANDARD LVCMOS33 } [get_ports {led[7]}]
set_property -dict { PACKAGE_PIN V13 IOSTANDARD LVCMOS33 } [get_ports {led[8]}]
set_property -dict { PACKAGE_PIN V3 IOSTANDARD LVCMOS33 } [get_ports {led[9]}]
set_property -dict { PACKAGE_PIN W3 IOSTANDARD LVCMOS33 } [get_ports {led[10]}]
set_property -dict { PACKAGE_PIN U3 IOSTANDARD LVCMOS33 } [get_ports {led[11]}]
set_property -dict { PACKAGE_PIN P3 IOSTANDARD LVCMOS33 } [get_ports {led[12]}]
set_property -dict { PACKAGE_PIN N3 IOSTANDARD LVCMOS33 } [get_ports {led[13]}]
set_property -dict { PACKAGE_PIN P1 IOSTANDARD LVCMOS33 } [get_ports {led[14]}]
set_property -dict { PACKAGE_PIN L1 IOSTANDARD LVCMOS33 } [get_ports {led[15]}]
## Configuration options, can be used for all designs
set_property CONFIG_VOLTAGE 3.3 [current_design]
set_property CFGBVS VCCO [current_design]
## SPI configuration mode options for QSPI boot, can be used for all designs
set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design]
set_property BITSTREAM.CONFIG.CONFIGRATE 33 [current_design]
set_property CONFIG_MODE SPIx4 [current_design]

This ends the chapter on using the C programming language to program the Femto CPU core. The minimal Femto CPU core has been a valuable educational tool in learning RISC-V assembly and C programming. With only five RISC-V instructions implemented, Femto CPU is of little practical use beyond these educational tutorials. The next chapter in this series will introduce the open-source Hazard3 RISC-V core which supports all 47 RISC-V instructions.




























computer with a single 14.31818 MHz crystal oscillator.








