Skip to content

Commit

Permalink
feat: update project tt_um_levenshtein from peter-noerlund/tt09-leven…
Browse files Browse the repository at this point in the history
…shtein

Commit: 335c793535c384ab37ef0727ebcf1e9118ee2db1
Workflow: https://github.com/peter-noerlund/tt09-levenshtein/actions/runs/11111357389
  • Loading branch information
TinyTapeoutBot authored and urish committed Sep 30, 2024
1 parent 9464f3b commit c4090ab
Show file tree
Hide file tree
Showing 8 changed files with 6,918 additions and 7,732 deletions.
8 changes: 4 additions & 4 deletions projects/tt_um_levenshtein/commit_id.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{
"app": "Tiny Tapeout tt09 90075702",
"app": "Tiny Tapeout tt09 30dbb0cd",
"repo": "https://github.com/peter-noerlund/tt09-levenshtein",
"commit": "f3151539beb6438e41c9b7fb47263c3d555c137a",
"workflow_url": "https://github.com/peter-noerlund/tt09-levenshtein/actions/runs/10984285815",
"commit": "335c793535c384ab37ef0727ebcf1e9118ee2db1",
"workflow_url": "https://github.com/peter-noerlund/tt09-levenshtein/actions/runs/11111357389",
"sort_id": 1727039658434,
"openlane_version": "OpenLane2 2.1.5",
"openlane_version": "OpenLane2 2.1.7",
"pdk_version": "open_pdks bdc9412b3e468c102d01b7cf6337be06ec6e9c9a"
}
143 changes: 83 additions & 60 deletions projects/tt_um_levenshtein/docs/info.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,113 +13,136 @@ tt09-levenshtein is a fuzzy search engine which can find the best matching word

Fundamentally its an implementation of the bit-vector levenshtein algorithm from Heikki Hyyrö's 2022 paper with the title *A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances*.

#### UART
### SPI

The device is organized as a wishbone bus which is accessed through commands on the UART.
The device is organized as a wishbone bus which is accessed through commands on an SPI bus.

Each command consists of 4 input bytes and 1 output byte:
The maximum SPI frequency is 25% of the master clock.

**Input bytes:**

| Byte | Bit | Description |
|------|-----|-------------------------------------------|
| 0 | 7 | READ=`0` WRUTE=`1` |
| 0 | 7 | READ=`0` WRITE=`1` |
| 0 | 6-0 | Address bit 22-16 |
| 1 | 7-0 | Address bit 15-8 |
| 2 | 7-0 | Address bit 7-0 |
| 3 | 7-0 | Byte to write if WRITE, otherwise ignored |

**Output byts:**
**Output bytes:**

| Byte | Bit | Description |
|------|-----|------------------------------------------|
| 0 | 7-0 | Byte read if READ, otherwise just `0x00` |

Since the SPI bridges to a wishbone bus which is shared by another master and because register and SRAM have different latencies, the response time is variable.

#### Memory Layout
While the bus is working, the output bits will be zero. The final output byte will be preceeded by a one-bit.

As indicated by the UART protocol, the address space is 23 bits.
Note that this means that the value `0x5A` can appear 8 different ways on the SPI bus:

The lower half of the memory space is used for registers and the upper half of the memory space is accessing an external SPI PSRAM.
```
01 5A 0000000 1 01011010
02 B4 000000 1 01011010 0
05 68 00000 1 01011010 00
0A D0 0000 1 01011010 000
15 A0 000 1 01011010 0000
2B 40 00 1 01011010 00000
56 80 0 1 01011010 000000
AD 00 1 01011010 00000000
```

The address space is basically as follows

| Address | Usable Size | Description |
|-------------------|-------------|----------------|
| 0x000000-0x3FFFFF | 6B | Registers |
| 0x400000-0x5FFFFF | 512B | Bitvectors |
| 0x600000-0x7FFFFF | 2MB | Dictionary |
### Memory Layout

The registers have a different layout for read and write.
As indicated by the SPI protocol, the address space is 23 bits.

**Write:**
| Address | Size | Description |
|----------|------|------------------|
| 0x000000 | 1 | Control register |
| 0x000001 | 1 | Word length |
| 0x000002 | 2 | Mask |
| 0x000004 | 2 | Initial VP value |
The address space is basically as follows:

**Read:**
| Address | Size | Description |
|----------|------|-----------------|
| 0x000000 | 1 | Status register |
| 0x000001 | 1 | Distance |
| 0x000002 | 2 | Word index |
| Address | Size | Access | Identifier |
|----------|------|--------|-------------|
| 0x000000 | 1 | R/W | `CTRL` |
| 0x000001 | 1 | R/O | `DISTANCE` |
| 0x000002 | 2 | R/O | `INDEX` |
| 0x000200 | 512 | R/W | `VECTORMAP` |
| 0x000400 | 8M | R/W | `DICT` |

#### Operation
**CTRL**

##### Initialization
The control register is used to start the engine and see when it has completed.

Before doing anything, the bitvector memory needs to be filled with `0x00`. That is the 512 bytes from `0x400000` to `0x4001FF`. This is only necessary to do once after power up.
The layout is as follows:

##### Store dictionary
| Bits | Size | Access | Description |
|------|------|--------|-------------------------------------------------------------|
| 0-4 | 4 | R/W | Word length |
| 5-6 | 2 | R/O | Not used |
| 7 | 1 | R/O | Is set to `1` while the engine runs and `0` when it is done |

Next, you need to store a dictionary in the SRAM. The dictionary needs to be stored at address `0x600000`. Each word must be encoded using 1 bit character, cannot use `0xFE` and `0xFF` and must not exceed 255 characters. Each word is terminated with the byte value `0xFE` and the dictionary itself is terminated by the byte value `0xFF`. In total there can be no more than 65535 words and the whole list must not exceed 2MB.
When data is written to this address, the engine automatically starts.

##### Perform fuzzy matching
**DISTANCE**

To perform a fuzzy search, you first need to generate a map of 16-bit vectors based on the input word.
When the engine has finished executing, this address contains the levenshtein distance of the best match.

For each character in the word, you produce a bit vector representing which position in the word holds the character.
**INDEX**

Example:
When the engine has finished executing, this address contains the index of the best word from the dictionary.

```verilog
word = application
**VECTORMAP**

a = 16'b00000000_01000001; // a_____a____
p = 16'b00000000_00000110; // _pp________
l = 16'b00000000_00001000; // ___l_______
i = 16'b00000001_00010000; // ____i___i__
c = 16'b00000000_00100000; // _____c_____
t = 16'b00000000_10000000; // _______t___
o = 16'b00000010_00000000; // _________o_
n = 16'b00000100_00000000; // __________n
```
The vector map must contain the corresponding bitvector for each input byte in the alphabet.

You then store each bitvector at address `0x400000 + char * 2`. The bitvectors is stored in bit endian byte order.
If the search word is `application`, the bit vectors will look as follows:

You then need to store the length in the word length register (address `0x000001`)
| Letter | Index | Bit vector |
|--------|--------|-----------------------------------------|
| `a` | `0x61` | `16'b00000000_01000001` (`a_____a____`) |
| `p` | `0x70` | `16'b00000000_00000110` (`_pp________`) |
| `l` | `0x6C` | `16'b00000000_00001000` (`___l_______`) |
| `i` | `0x69` | `16'b00000001_00010000` (`____i___i__`) |
| `c` | `0x63` | `16'b00000000_00100000` (`_____c_____`) |
| `t` | `0x74` | `16'b00000000_10000000` (`_______t___`) |
| `o` | `0x6F` | `16'b00000010_00000000` (`_________o_`) |
| `n` | `0x6E` | `16'b00000100_00000000` (`__________n`) |
| * | * | `16'b00000000_00000000` (`___________`) |

And a mask with the length-th bit set to 1 (`1 << (length - 1)`) in the 16-bit mask register (address `0x000002`) using bit endian byte order.
Since each vector is 16-bit, the corresponding address is `0x200 + index * 2`

And a VP value which is simly the first length bits set to 1 (`(1 << length) - 1`) in the 16-bit vp mast register (address `0x000004`) using big endian byte order.
**DICT**

Finally, you store a `1` in the control register at address `0x000000`.
The word list.

The accelerator will now scan through the dictionary to find matches.
The word list is stored of a sequence of words, each encoded as a sequence of 8-bit characters and terminated by the byte value `0x00`. The list itself is terminated with the byte value `0x01`.

To know when the algorithm is done, you poll the status register (address `0x000000`) at a regular interval until the 0th bit is 0.
Note that the algorithm doesn't care about the particular characters. It only cares if they are identical or not, so even though the algorithm doesn't support UTF-8 and is limited to a character set of 254 characters,
ignoring Asian alphabets, a list of words usually don't contain more than 254 distinct characters, so you can practially just map lettters to a value between 2 and 255.

You can then read out the levenshtein distance at address `0x000001` and the index of the word in the dictionary which was the best match at `0x000002` (big endian).
## How to test

Finally, you need to clear the bitvectors before the next search. Instead of filling the entire 512 bytes with `0x00`, you simply clear the bitvector positions you set earlier (in the example that would be `a`, `p`, `l`, `i`, `c`, `t`, `o`, and `n`)
You can compile the client as follows:

## How to test
```sh
mkdir -p build
cmake -G Ninja -B build .
cmake --build build
```

TODO
Next, you can run the test tool:

```sh
./build/client/client --device tt09 --test
```

This will load 1024 words of random length and characters into the SRAM and then perform a bunch of searches, verifying that the returned result is correct.

## External hardware

List external hardware used in your project (e.g. PMOD, LED display, etc), if any
To operate, the device needs an SPI PSRAM PMOD. The design is tested with the QQSPI PSRAM PMOD from Machdyne, but any memory PMOD will work as long as it supports:

* WRITE (`0x02`) with no latency
* READ (`0x03`) with no latency
* 24-bit addresses
* Uses pin 0 for `SS#`.

Note, that this makes the SRAM/Flash PMOD from mole99 incompatible, but the spi-ram-emu project for the RP2040 can be used if it is changed to 24-bit adressing (It can just ignore the eight most significant bits)
22 changes: 11 additions & 11 deletions projects/tt_um_levenshtein/info.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ project:
- tt_um_levenshtein.v
- levenshtein_controller.sv
- spi_controller.sv
- uart_wishbone_bridge.sv
- spi_wishbone_bridge.sv
- wb_arbiter.sv
- wb_interconnect.sv

Expand All @@ -30,27 +30,27 @@ pinout:
ui[0]: ""
ui[1]: ""
ui[2]: ""
ui[3]: "UART RxD"
ui[4]: ""
ui[5]: ""
ui[6]: ""
ui[3]: ""
ui[4]: "SPI SS#"
ui[5]: "SPI SCK"
ui[6]: "SPI MOSI"
ui[7]: ""

# Outputs
uo[0]: ""
uo[1]: ""
uo[2]: ""
uo[3]: ""
uo[4]: "UART TxD"
uo[4]: ""
uo[5]: ""
uo[6]: ""
uo[7]: ""
uo[7]: "SPI MISO"

# Bidirectional pins
uio[0]: "PMOD SPI SS#"
uio[1]: "PMOD SPI MOSI"
uio[2]: "PMOD SPI MISO"
uio[3]: "PMOD SPI SCK"
uio[0]: "SRAM SPI SS#"
uio[1]: "SRAM SPI MOSI"
uio[2]: "SRAM SPI MISO"
uio[3]: "SRAM SPI SCK"
uio[4]: ""
uio[5]: ""
uio[6]: ""
Expand Down
Loading

0 comments on commit c4090ab

Please sign in to comment.