revisiting EXAPUNKS: part 2: oh god this is still the tutorial

my genuine estimation of this part is that it will exclusively be me going back to TWN 4 and trying to optimize the M register and see what I can do, but who know, maybe this will be quicker than that.

RETURNING TO TWN #4

just for a quick refresher, here is what we are working with, im not gonna explain it here, but my previous post went fairly in depth with how this is solved.

Also quick note on notation: X[letter] is used to reference the starting EXAs, XA, XB, XC, and so on. When an EXA makes a repl, that is denoted with [name]:[index] where index is the number of repls that EXA has made previously. Therefore, if one of your EXAs, XB, makes 2 repls, they will be called XB:0 and XB:1. This makes it easy to notate which EXAs are doing what at various points in the algorithm.

my basic plan for using the M register to write faster goes pretty much as follows:

move to outbox

make a file

read in from the m register to the new file

will eventually be killed by something from xb

grab file 200, copy the value, and delete file 200

make a bunch of repls, each of which will decrement the value read and write to the m register

once below 0, move into the outbox and kill xa

this is probably going to be way harder than it sounds, especially with syncing up the xb repls to write in the correct order (because they will have no communication, only timing), but here is a little "proof of concept" program

LINK 800
LINK 800
MAKE
MARK WRITE
@REP 2 #Repeat 2 times for 2 EXAs, so the loops are synced
COPY M F
@END
JUMP WRITE

LINK 800
GRAB 200
COPY F X
WIPE
REPL SUBTRACT #make a copy to write in parallel with XB
SUBI X 1 X #Subtract 1 from X so that XB is starting from a different value

MARK SUBTRACT
COPY X M
SUBI X 2 X # Subtract 2 for the 2 EXAs, that way the values they write are unique
TEST X > -1
TJMP SUBTRACT
LINK 800 #As soon as any EXA is below 0, move to outbox and kill XA (any subsequent EXAs will move to outbox, kill nothing, and die)
KILL

this genuinely works a lot better than i expected for such a simple program, and even works if i scale it up to 3 EXAs, but then the timings starts to fall apart.

With 2 EXAs writing, the cycles drop from ~300 to 259. With 3, it drops again to 178, only ~40 cycles off from top percentile! But once I try adding 4, there becomes an issue where XB:0 finishes its first loop and writes its second value to the M register before XB can write its first, causing the numbers to come in out of order. How can we fix this? Well, we need to look at the length of the loop (many such cases for these problems).

MARK SUBTRACK

COPY X M # 2 cycles!
SUBI X 4 X
TEST X > -1
TJMP SUBTRACT

So if we count up this loop, this is 4 instructions + 1 M register write = 5 total cycles/loop. This is useful for multiple reasons: First of all, 3 EXAs writing once every 5 cycles = 3/5 writes per cycle. This can be used as a benchmark later. Additionally, knowing that it is 5 cycles helps us sync up the code with other things, such as the initial repl instructions.

REPL SUBTRACT
SUBI X 1 X

This code is very brief, only 2 instructions, but since it is copied multiple times to make multiple repls causes issues. Firstly, the fact that it is 2 instructions throws a lot off, as that means each repl is 2 cycles behind the one before it. With this being the case, we can never write more than once every 2 cycles (except for cases like our current one, where we shave off 1 cycle on the loop, cause it "should" be 3/6)

my plan is to try and make some "waiting" mark, that will (at the cost of some cycles on the frontend), allow for the repls in the SUBTRACT loop to be offset by only 1 cycle. how am i going to do this? idk, but heres my basic idea

# stuff

REPL WAIT

SUBI X 1 X

REPL SUBTRACT

SUBI X 1 X

MARK WAIT

NOOP

MARK SUBTRACT

# more stuff

so this works...once. after looping on this again, then we end up where that next batch of EXAs is offset from this batch by 3, so it basically doesnt help. But I got an idea...

COPY 3 T # 3 for the 3 total EXAs
@REP 2
REPL WAIT
SUBI X 1 X
SUBI T 1 T
@END
NOOP # Simulate the extra cycle it takes for a repl to enter WAIT, keep it synced
MARK WAIT
SUBI T 1 T
TJMP WAIT
MARK SUBTRACT

Instead of a constant wait, I can have the EXAs sit in a loop that syncs them up, and the loop will last for less time the more repls it makes! With this setup, XB only makes a repl every 3 cycles, but the WAIT loop lasts for 2 cycles per exa remaining, leaving them perfectly synced to write every cycle (or close to it). This scales up easily, as long as you chance the starting T value and the REP amount, it will make any amount of EXAs and they will wait long enough to enter SUBTRACT just 1 cycle apart.

With a bit of tweaking and adjusting for cycles to keep it synced, this works! And well too, but I think it could be improved further. Before getting to that, though, here is a solid solution I have come to:

LINK 800
LINK 800
MAKE

MARK WRITE
@REP 6 # 6 for 6 repls
COPY M F
@END
JUMP WRITE # This loop is 7 cycles long (6 COPY and then 1 JUMP)

LINK 800
GRAB 200
COPY F X

COPY 6 T # 6 for 6 repls
@REP 5 # 5 for 6 total repls - 1 for itself
REPL WAIT
SUBI X 1 X
SUBI T 1 T
@END

NOOP # Wait 1 cycle so it keeps the pattern of waiting

MARK WAIT
SUBI T 1 T
TJMP WAIT

MARK SUBTRACT
COPY X M
SUBI X 6 X
@REP 2 # Extra NOOPs to sync with WRITE in XA (must be equal to number of repls - 4)
NOOP
@END
TEST X > -1
TJMP SUBTRACT # 7 total cycles when the NOOPs are factored in

WIPE #All repls other than XB will error and die here, XB will continue and kill XA
LINK 800
KILL # XA is in an infinite loop so it needs to be killed

This gives a very respectable 141 cycles and fits neatly within the 50 size limit at 44 lines, getting us a pretty solid solution! With 6 total EXAs writing, this writes 6 times every 7 cycles, a steep improvement over 3/5, and it starts writing on cycle 18, which is not too much overhead setup. This setup works with any about of repls (up to 9, the max in a given host), as long as you adjust the timings as needed, but after some experimentation, 6 repls seems to be the fastest solution due to the extra cycles needed to create and delete more repls. Its not a large difference (141 vs 142/3/4), but still a difference and this was the fastest.

But, I'm still not satisfied. I have seen the light now and I think a 1 write/cycle setup is possible, albeit there is 1 major hurdle (and a few other notes I need to share with you).

The main slowdown I am targeting now are the NOOP instructions in SUBTRACT, which serve no purpose other than making sure the SUBTRACT loop in XB lasts just as long as the WRITE loop in XA, because if they are not synced then during the "empty" JUMP instruction where XA cannot read, the XB repls keep writing to the M register and cause the numbers to be read out of order (as a quick reminder, if there are multiple things in the M register at the same time, EXAs will read them in a random order). Because of this, the SUBTRACT loop must be artificially slowed down, and it is impossible to write something every single cycle.

For another proof-of-concept, I am going to ignore the size limitation and change XB to, instead of a loop, read from the M register 100 times in a row (The highest test case is 99, so 100 values written with the 0). I will also remove the NOOPs from SUBTRACT, and only use 5 repls because the base SUBTRACT loop is 5 cycles long, so they can all write without ever overlapping.

This change drops our solution to a speedy 123 cycles in the worst case, with 16 cycles of setup, 100 cycles of writing, and 8 cycles of cleanup once it is finished. However, this takes up 131 lines, far over the size limit, and any solution to cut down lines (read: use a loop) runs into this issue of getting out of sync during the JUMP instruction.

Future Izzy here: after reflecting and thinking on it more, there really is not a solution that will write once every cycle, as no matter what we will be bottlenecked by the 1 EXA writing the the output file, and we cannot write with multiple EXAs at once. However, we can still do better than the 5 every 6 we are writing currently, if only by a little bit.

Essentially, we call back to the idea of unwrapping the loop, and maximize our writes/cycle using the 50 lines available to us. The minimum amount of cycles to write to the M register, decrement X, check if X is >= 0, and conditionally jump is 5 cycles, and we have about 15 cycles worth of space to work with. Therefore, if I adjust the SUBTRACT loop to repeat twice inside itself, and adjust the WRITE loop to write 10 times and then jump, we can write 10 times every 11 cycles, barely edging out the rate of 9 repls working in parallel, with far less overhead. This squeezes in at 49 instructions and looks as follows:

LINK 800
LINK 800
MAKE

MARK WRITE
@REP 10
COPY M F
@END
JUMP WRITE # Nothing major here, 10 repeats because that is the most that can fit under 50 lines and stay synced

LINK 800
GRAB 200
COPY F X

COPY 5 T
@REP 4 # Again, 5 repls because that is the minimum amount needed to write every cycle
REPL WAIT
SUBI X 1 X
SUBI T 1 T
@END

NOOP

MARK WAIT
SUBI T 1 T
TJMP WAIT

MARK SUBTRACT
@REP 2 # Copied twice to fit extra code under the 50 size limit
COPY X M
SUBI X 5 X
TEST X < 0 # This logic needed to change slightly so it exits the loop early if needed
TJMP END
@END
JUMP SUBTRACT # Unconditional jump back up if it reaches the bottom as normal

MARK END
WIPE
LINK 800
KILL

And this works in 132 cycles, an official improvement over my evil solution from last time! I won't lie and say that it feels a little bit upsetting vs what's possible (I could talk more about my dislike of the size limitations, but I get why they exist), but I am much happier with this solution and can't immediately find much "flawed" with it, the next thing I would do to improve it is take up more cycles and increase the @REPs in WRITE and SUBTRACT to approach 1:1 writes:cycles.

I also forgot about this last time, EXAPUNKS lets you record a gif of your solution after the fact, so I will start posting those as well! Only the cycle solutions probably, but if something interesting happens in the others i'll post that too. Here you go!

I know it is basically impossible to tell what's happening, but this is the final program! XB first runs forward, reads from file 200, and clones itself a bunch, while XA runs all the way to the outbox, makes a new file, and waits to read incoming values. After the brief wait to sync them, all of the repls (clones) of XB start writing their values to the M register and they get sucked up by XA and written down.

(Also, i think the KILL command is very funny, at the end you see XB go forward into XA's host and it literally shoots a later up in the air, which comes down and destroys XA)

Anyway, yeah this is the end of this post! I am not gonna continue on past here cause it is already quite long, but next time I will start the real game, I promise (definitely by next sunday, maybe sooner)

Previous post here

Search This Blog

izzzzzzzzzzzzzzzzzzzzy's blog (now with twice as many z's)

revisiting EXAPUNKS: part 2: oh god this is still the tutorial

Comments

Post a Comment

Popular posts from this blog

athens comic !

revisiting EXAPUNKS part 0.5: instruction overview