Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 269 additions & 2 deletions crates/lsh/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ At a high level:
* The compiler lowers them into bytecode
* The runtime executes the bytecode on the input text line by line

To understand the definition language itself, read [definitions/README.md](definitions/README.md).

For debugging and optimizing language definitions use `lsh-bin`.
To see the generated assembly, for example:
```sh
Expand All @@ -24,3 +22,272 @@ Or to render a file:
```sh
cargo run -p lsh-bin -- render --input assets/highlighting-tests/html.html crates/lsh/definitions
```

## Language

See [definitions/README.md](definitions/README.md).

## Instruction Set

### Registers

The virtual machine has 16 32-bit registers, named `r0` to `r15`.
`r0` to `r2` currently have a fixed meaning:
* `r0` is `off`, which is the text input offset
* `r1` is `hs`, which describes the start of the next highlight range, emitted via a `yield` statement, corresponding to a `flush` instruction
* `r2` is `pc`, the program counter, aka instruction offset

Registers `r0` and `r1` are preserved between calls and `r2` to `r15` are caller saved.

> [!NOTE]
> `pc` is pre-incremented when processing instructions.
> For instance, `mov r15, pc` saves the address of the _next_ instruction.

### Instruction: mov, add, sub

`mov` assigns `src` to `dst`.
As one may expect, `add` and `sub` perform the corresponding `+=` and `-=` arithmetic.

Mnemonic:
```
mov dst, src
add dst, src
sub dst, src
```

Encoding:
```
0 1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+
| opcode | dst | src |
+---------------+-------+-------+
mov = 0x00
add = 0x01
sub = 0x02
```

### Instruction: movi, addi, subi

`movi`, `addi`, and `subi` are immediate variants of `mov`, `add`, and `sub`.
The `src` parameter is replaced with a fixed 32-bit constant.

Mnemonic:
```
movi dst, imm
addi dst, imm
subi dst, imm
```

Encoding:
```
0 1 2 3
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+-------+-------+
| opcode | dst | | imm |
+---------------+-------+-------+-------+-------+-------+-------+
movi = 0x03
addi = 0x04
subi = 0x05
```

### Instruction: call

`call` pushes `r2` to `r15` on the stack and jumps to `tgt`.

Mnemonic:
```
call tgt
```

Encoding:
```
call:
0 1 2
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+
| opcode | tgt |
+---------------+-------+-------+-------+-------+
call = 0x06
```

### Instruction: ret

`ret` restores and pops the last bundle of registers (`r2` to `r15`).
When the call stack is empty, `ret` resets the VM to its entrypoint and clears registers `r2` to `r15`.

Mnemonic:
```
ret
```

Encoding:
```
ret:
0 1
0 1 2 3 4 5 6 7
+---------------+
| opcode |
+---------------+
ret = 0x07
```

### Instruction: jeq, jne, jlt, jle, jgt, jge

Jumps to `tgt` if the two given registers fulfill the comparison.
* `jeq`: jump if `lhs == rhs`
* `jne`: jump if `lhs != rhs`
* `jlt`: jump if `lhs < rhs`
* `jle`: jump if `lhs <= rhs`
* `jgt`: jump if `lhs > rhs`
* `jge`: jump if `lhs >= rhs`


Mnemonic:
```
jeq lhs, rhs, tgt
jne lhs, rhs, tgt
jlt lhs, rhs, tgt
jle lhs, rhs, tgt
jgt lhs, rhs, tgt
jge lhs, rhs, tgt
```

Encoding:
```
0 1 2 3
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+-------+-------+
| opcode | lhs | rhs | tgt |
+---------------+-------+-------+-------+-------+-------+-------+
jeq = 0x08
jne = 0x09
jlt = 0x0a
jle = 0x0b
jgt = 0x0c
jge = 0x0d
```

### Instruction: jeol

Jumps to `tgt` if the input offset has reached the end of line.

Mnemonic:
```
jeol tgt
```

Encoding:
```
0 1 2
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+
| opcode | tgt |
+---------------+-------+-------+-------+-------+
jeol = 0x0e
```

### Instruction: jc (JumpIfMatchCharset)

Jumps to `tgt` if the next `min` characters are found in the charset at `idx`.
Consumes no more than `max` characters.
On success the `off` register is incremented by the amount of matched characters.

Mnemonic:
```
jc idx, min, max, tgt
```

Encoding:
```
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| opcode | idx | min | max | tgt |
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
jc = 0x0f
```

### Instruction: jp (JumpIfMatchPrefix)

Jumps to `tgt` if the next characters in the input match the given prefix string at `idx`.
On success the `off` register is incremented by the string length.

Mnemonic:
```
jp idx, tgt
```

Encoding:
```
0 1 2 3 4
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+
| opcode | idx | tgt |
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+
jp = 0x10
```

### Instruction: jpi (JumpIfMatchPrefixInsensitive)

Jumps to `tgt` if the next characters in the input match the given prefix string at `idx` using an ASCII-case-insensitive comparison.
On success the `off` register is incremented by the string length.

Mnemonic:
```
jpi idx, tgt
```

Encoding:
```
0 1 2 3 4
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+
| opcode | idx | tgt |
+---------------+-------+-------+-------+-------+-------+-------+-------+-------+
jpi = 0x11
```

### Instruction: flush

Tells the runtime that the range between `hs` and `off` should be highlighted with the color stored in the register at index `kind`.
The runtime will then set `hs` to `off`.

> [!NOTE]
> This is a flaw in the current design, because it's not flexible enough.
> Ideally, it would be a "color the range from point A to point B with color C".

Mnemonic:
```
flush kind
```

Encoding:
```
0 1
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+---------------+-------+-------+
| opcode | kind | |
+---------------+-------+-------+
flush = 0x12
```

### Instruction: await

Pauses execution if the input offset has reached the end of line.
The runtime will resume execution with the next line of input at the next instruction.

Mnemonic:
```
await
```

Encoding:
```
0
0 1 2 3 4 5 6 7
+---------------+
| opcode |
+---------------+
await = 0x13
```
Loading