Building a Cog

Building a cog

The P2 has internal to it 8 seperate “cogs” (or cores) and a shared “hub”.

P2 Overview

To the elixir programmers amongst us, this diagram screams (at least initially) for us to start by writing a CogWorker process and a HubWorker process.

Let’s start with the CogWorker.

Implementing a minimal CogWorker

Since I don’t actually know what Event Trackers, 3 Levels of Interupts, or the Hidden Debug Functions are let’s start by ignoring them ;-)

Let’s build a basic GenServer to simulate a cog with the following steps:

Fill the Register RAM with 512 * 32bits of Longs (2k bytes)
Fill the Lookup RAM with 512 * 32bits of Longs (2k bytes)
Set the PC (Program Counter) to 0. This is the location of the next instruction to execute

defmodule P2Dasm.Cog.Worker do
  use GenServer

  def start_link(cogid) do
    GenServer.start_link(__MODULE__, %{id: cogid}, name: cogid)
  end

  def init(state) do
    newstate =
    state
    |> Map.put(:reg, genmem())
    |> Map.put(:lut, genmem())
    |> Map.put(:pc,               0)

    {:ok, newstate}
  end

  def genmem() do
    Range.new(1, 512*4)
    |> Enum.reduce([], fn(_,acc) -> [0 | acc] end)
    |> :binary.list_to_bin()
  end

Let’s test and inspect it:

iex(1)> {:ok, pid} = P2Dasm.Cog.Worker.start_link(:cog0)
{:ok, #PID<0.119.0>}
iex(2)> :sys.get_state(pid)
%{
  id: :cog0,
  lut: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    ...>>,
  pc: 0,
  reg: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    ...>>
}

CPUs, how do they work?

In our emulator today we’re going to write a “fetch”, “Execute” loop for testing. Before we start, a note about memory on the P2.

The P2 can read and execute machine instructions from the register memory, the lut memory, or the HUB memory. The cog knows which type of memory to read from because addresses are mapped like so:

PC Address       Instruction Source         Word Size     PC Increment
$00000..$001FF     cog register RAM    32 bits (long)                1
$00200..$003FF       cog lookup RAM    32 bits (long)                1
$00400..$FFFFF              hub RAM     8 bits (byte)                4

… because I don’t want to deal with these variations all over my codebase, I’m going to go ahead and write a function to take care of it.

If PC is between 0x000 and 0x1ff (511), it’s in the Register RAM.
If PC is between 0x200 and 0x3ff (1023), it’s in the LUT RAM.
If PC is between 0x400 and 0xFFFFF (1048575), it’s in HUB RAM.
- Yes, that is 1Meg, not 512k. It’s intentional and we’ll get back to it later.
- Yes, that means you can’t execute code from HUB RAM below 0x400 (but you can read/write to it - more later)

First we need a function to read a generic instruction from a generic location:

  def fetch_instruction_word(cs=%{pc: pc}) when (pc < 0x200), do: read_cog_mem(cs)
  def fetch_instruction_word(cs=%{pc: pc}) when (pc > 0x3ff), do: read_hub_mem(cs)
  def fetch_instruction_word(cs),                             do: read_lut_mem(cs)

The “when” clause ensures that the correct cog/hub/lut function is called depending on pc’s (the program counter)’s value. The order is important, the first to true wins.

Next, we write the functions that access and retrieve the values for that specific type of memory:

  def read_cog_mem(cs), do: :binary.part(cs.reg, {cs.pc*4, 4})
  def read_hub_mem(_),  do: false # Not Supported Yet, ** Crash Hard
  def read_lut_mem(cs), do: :binary.part(cs.lut, {(cs.pc - 0x200)*4, 4})

The cog reg and lut memory is stored in the cogstate in the field :reg and :lut as a series of 8 bit values. Given that all cog/lut reads are ALWAYS on a 4 byte boundary (32 bits) then to extract that data we need to multiply the PC value by 4 and use that as the offset to read from the stored state with a length of 4.

Let’s fake a cogstatus with a pc=0. This should return the first four values from the reg: field

iex(1)> fakecogstatus = %{pc: 0, reg: <<0,1,2,3,4,5,6,7>>, lut: <<0,10,20,30,40,50,60,70>>}
%{
  lut: <<0, 10, 20, 30, 40, 50, 60, 70>>,
  pc: 0,
  reg: <<0, 1, 2, 3, 4, 5, 6, 7>>
}
iex(2)> P2Dasm.Cog.Worker.fetch_instruction_word(fakecogstatus)
<<0, 1, 2, 3>>

Next, let’s set pc=1, which should return the next four octets from the reg: field

iex(3)> fakecogstatus1 = %{pc: 1, reg: <<0,1,2,3,4,5,6,7>>, lut: <<0,10,20,30,40,50,60,70>>}
%{
  lut: <<0, 10, 20, 30, 40, 50, 60, 70>>,
  pc: 1,
  reg: <<0, 1, 2, 3, 4, 5, 6, 7>>
}
iex(4)> P2Dasm.Cog.Worker.fetch_instruction_word(fakecogstatus1)
<<4, 5, 6, 7>>

Next, let’s set pc = 0x200, which should give us the first four octets of the lut: field

iex(8)> fakecogstatus0x200 = %{pc: 0x200, reg: <<0,1,2,3,4,5,6,7>>, lut: <<0,10,20,30,40,50,60,70>>}
%{
  lut: <<0, 10, 20, 30, 40, 50, 60, 70>>,
  pc: 512,
  reg: <<0, 1, 2, 3, 4, 5, 6, 7>>
}
iex(9)> P2Dasm.Cog.Worker.fetch_instruction_word(fakecogstatus0x200)
<<0, 10, 20, 30>>

Next, let’s set pc = 0x201, which should give us the next four octets of the lut: field

iex(10)> fakecogstatus0x201 = %{pc: 0x201, reg: <<0,1,2,3,4,5,6,7>>, lut: <<0,10,20,30,40,50,60,70>>}
%{
  lut: <<0, 10, 20, 30, 40, 50, 60, 70>>,
  pc: 513,
  reg: <<0, 1, 2, 3, 4, 5, 6, 7>>
}
iex(11)> P2Dasm.Cog.Worker.fetch_instruction_word(fakecogstatus0x201)
"(2<F"

… and it does.

Getting a wiggle on

Let’s make an assumption that all instructions take one clock cycle to execute. In order to test our fetch/execute loop we’re going to need to stimulate it from the outside much like real hardware. Real hardware needs a clock signal, we’ll simulate that with a “:tick” message.

  ## A message is asyncronous so needs no reply, hence :noreply
  #    The fetch_execute returns the new cogstate value.
  def handle_info(:tick, state), do: {:noreply, fetch_execute(state)}

  def fetch_execute(state) do
    <<instr_32bits::size(32)>> = fetch_instruction_word(state)

    textinstr = P2Dasm.Sandbox.decode_instr(&P2Dasm.Sandbox.dis_instr/1, <<instr_32bits::size(32)>>)
    IO.puts("pc: #{state.pc} -> #{textinstr}")

    function = fn(instr) -> P2Dasm.Sandbox.exe_instr(instr, state) end
    P2Dasm.Sandbox.decode_instr(exefunction, <<instr_32bits::size(32)>>)
  end

Now, let’s test it!

iex(1)> P2Dasm.Cog.Worker.start_link(:cog0)
{:ok, #PID<0.118.0>}
iex(2)> send(:cog0, :tick)
pc: 0 -> NOP
:tick
iex(3)> send(:cog0, :tick)
pc: 0 -> NOP
:tick
iex(4)> send(:cog0, :tick)
pc: 0 -> NOP
:tick
iex(5)> send(:cog0, :tick)
pc: 0 -> NOP
:tick

Ahha! a bug. We’re not incrementing pc so we keep executing the command at register 0.

At this point we have a choice. Do we have the fetch_execute function increment it or have the emulated instruction do it? The simplest answer has to be letting the emulated instruction do. Should the instruction be an instruction that changes program flow, it needs to be able to set pc to the correct value.

To do this, we modify exe_instr for NOP as follows:

  def exe_instr(:NOP, cogstate),   do: cogstate

  def exe_instr(:NOP, cogstate),   do: Map.put(cogstate, :pc, cogstate.pc+1)

Let’s hot-reload the code in our already running processor and try again:

iex(6)> r P2Dasm.Sandbox
warning: redefining module P2Dasm.Sandbox (current version loaded from _build/dev/lib/p2_dasm/ebin/Elixir.P2Dasm.Sandbox.beam)
  lib/p2_dasm/sandbox.ex:1

{:reloaded, P2Dasm.Sandbox, [P2Dasm.Sandbox]}
iex(7)> send(:cog0, :tick)
pc: 0 -> NOP
:tick
iex(8)> send(:cog0, :tick)
pc: 1 -> NOP
:tick
iex(9)> send(:cog0, :tick)
pc: 2 -> NOP
:tick
iex(10)> send(:cog0, :tick)
pc: 3 -> NOP
:tick
iex(11)> send(:cog0, :tick)
pc: 4 -> NOP
:tick

… and there we have it. We now have a cog that’s working and executing NOPs.

Ladies and Gentlemen, our process is now NOPSLED compliant ;-)