Skip to content

Conversation

@wnienhaus
Copy link
Collaborator

@wnienhaus wnienhaus commented Jul 12, 2023

This PR adds support for the ULP-FSM (not ULP RISC-V) of the ESP32-S2 and ESP32-S3.

(Note, the ESP32-S2 and S3 have the same ULP-FSM, so we don't distinguish between them in the implementation).

The entire instruction set of the ULP-FSM of the ESP32-S2 is supported, including the new load and store instructions to allow loading from or storing to the upper 16-bits of memory locations.

Use the -c command line argument when assembling or disassembling. Use -c esp32 (or omit the -c option) to select the original ESP32. Use -c esp32s2 to select the ESP32-S2/S3. Refer to the documentation in docs/disassembler.rst for more detail.

Please note: the latest release of MicroPython to date is 1.20, which does not enable the ULP on ESP32-S2/S3 devices. This is fixed already, but we need to wait for the next release to have the fix in an official version. To test this on a real device, use a recent nightly build of Micropython and flash that to your device.

This PR resolves #85 .

(The easiest way to review this PR is commit-by-commit. Each commit is one independent change.)

wnienhaus added 10 commits July 12, 2023 19:00
…s_s2 Currently the logic in these copies is identical to the original. This will makes it easier to see the changes we're making in subsequent commits.
Select cpu with -c when running the assembler (--mcpu as used by Espressif's esp32ulp-elf-as also works). The possible values are 'esp32' and 'esp32s2'. (Note esp32s2 also works for the ESP32-S3, because those two MCUs share the same ULP-FSM binary format). If no cpu is specified the original 'esp32' will be used as before.
In summary, these are the most important changes: * the `sub_opcode` field on all opcodes except ST have been shrunk from 3 bits down to 2 bits * the LD and ST opcodes support more variations (new instructions) * branching instructions have changed. The JUMPR instruction uses different comparisons and the JUMPS instruction now implements all comparisons in hardware, without requiring multiple instructions to emulate any comparison. * There is no more SLEEP instruction to select a sleep timer. Since Espressif chose to simply convert SLEEP instructions into WAIT instructions we're doing the same. Update integration tests to run for both the ESP32 and ESP32-S2.
The disassembler is now mainly the command line tool, which deals with interpreting user input, formatting output, etc. This allows us to add decoding logic for new cpus (the S2) and the disassembler can then dynamically load the correct decoding module.
Currently only the esp32 is implemented (esp32s2 soon to follow). This commit adds the -c command line option to the disassembler, as well as updates the integration tests to supply the cpu parameter, and test fixtures are renamed to include the cpu type in their name, so that we can have separate fixture files for other cpus.
To disassemble ESP32-S2 ULP binaries use the `-c esp32s2` option when running the disassembler. Update documentation to mention support for the ESP32-S2.
…pported by the S2 The ESP32-S2 changes the comparison operators that are natively supported by the CPU for the JUMPR and JUMPS instruction. For the JUMPS instruction all comparison operators are now natively supported in hardware.
For example STL and STH can be used to store to the lower or upper 16-bits of a memory address respectively. (The original ST instruction could only store to the lower 16-bits of a memory address) The following new instructions are now supported: * LDL, LDH * STL, STH, ST32, STI32, STI, STO
The following new instructions are now supported: * LDL, LDH * STL, STH, ST32, STI32, STI, STO Note: The disassembler will return LD instead of LDL and ST instead of STL, because they are each synonyms of the other. We can only pick either and so we picked the keyword that exists across both the ESP32 and the ESP32-S2/S3.
@wnienhaus
Copy link
Collaborator Author

wnienhaus commented Jul 12, 2023

There are some parts of the implementation I am currently not 100% happy with:

  1. Having both disassemble.py (cli tool) and decode{,_s2}.py (not cli tools) under tools.
    • I did consider having those files under esp32_ulp
    • but that would mean they get included in the upip package (soon mip) by default
    • and I didnt want those files on the device by default when one installs the package (maybe that's a weird choice?).
  2. Having two intentional bugs to match what Espressif's assembler (binutils-gdb/esp32ulp-elf-as) does.
    • This makes our comparison tests pass, but is still wrong (based on my understanding).
    • I reported both issues along with fixes to Espressif (here and here), but they haven't responded at all, and I wonder if they will give any attention to the ULP-FSM at all, now that the ULP RISC-V exists.
    • I was considering making our implementation correct and adjusting the tests somehow to account for the difference, but for now it is how it is.
  3. That opcode.py and opcode_s2.py share (duplicate) a lot of non-cpu specific code, such as parsing register numbers or evaluating expressions. It was like that before, so for now it's the same, but it could divided up more nicely.

@ThomasWaldmann @dpgeorge @mattytrentini - if you have time and want to take a look at this PR, i'd appreciate it.

@wnienhaus wnienhaus mentioned this pull request Jul 12, 2023
@wnienhaus
Copy link
Collaborator Author

One thing I noticed is that the S2 and S3 have different peripheral register addresses for similar things, different from the original ESP32 and also different between each other. That means that (most likely) our blink.py example will not work on an ESP32-S2/S3 even if the assembled ULP code has the right binary format.

I will test this on my S2 and S3 and add examples for the specific versions and add a mention of this detail into the documentation.

Specifying 0 as the label is different than not specifying a label at all. This commit corrects the behaviour when label 0 is used. Also run the all_opcodes fixture as integration test to ensure the same result as binutils-gdb/esp32ulp-as (which is how this bug was found).
@ftylitak
Copy link

ftylitak commented Aug 7, 2023

Hello all,

first of all thank you @wnienhaus for this usefull PR. We were in need of a Pulse Count implementation in ULP for ESP32S3 so based on this PR we changed the required addressed and it worked. (we were not sure whether to send a PR on wnienhaus:s2s3_support or on this main repo so we decided to just write this post as a description of the solution.)

The changes can be found in the last two commits of the following repo which has this PR merged and has as addition the specifics for ESP32S3:

In a nutshell, opcodes_s3.py and soc_s3.py were added and assemble.py was updated to identify cpu=esp32s3.

Also here is the exampel code used to test the Pulse Counter on ESP32S3 on GPIO 4.

https://github.com/insighio/micropython-esp32-ulp/blob/master/examples/pulse_counter_esp32s3.py

We are based on micropython 1.19.1 so we made changes in the code of micropython to enable ULP for ESP32S3.
Though the most important note that we can make is that ESP32S3 closes down all RTC Peripherals upon deepsleep. Thus the user needs to instruct the ESP32S3 to keep the RTC IO powered on when in deepsleep and this instruction must executed every time the device wakes up from deep sleep.

For this reason, we added one more function in the ULP module which initializes the GPIO port, defines it as Input, enables Pull Down and instructs to keep RTC peripherals powered on when in deep sleep. This is called every time the device wakes up from deep sleep or else the counting of the edges stops.

The call to this function: https://github.com/insighio/micropython-esp32-ulp/blob/90fe843783ac63acf6224bd3e9e11b3522ce5469/examples/pulse_counter_esp32s3.py#L117-L124

And the function itself in custom micropython branch:
https://github.com/insighio/micropython/blob/33b18e0b67f73f27affff86acc796539a14f8b07/ports/esp32/esp32_ulp.c#L115-L131

In this way, we have a functioning Pulse Counter. Though our tests, by setting the ulp.set_wakeup_period from 20.000 to 100 we are reliably reading 100 pulses per second.

Hope all the above are helpfull. Thanks one again.

Updated as per ESP-IDF v5.0.2. Also added reference URL to those constants in the ESP-IDF.
@wnienhaus
Copy link
Collaborator Author

@ftylitak Thank you so much for that feedback and testing this PR. Great to know that it works for you.

I have been working slowly on improving the PR to add support for the S2 and S3 peripheral register addresses. I just pushed that work to this PR.

I took a different approach than you for supporting the S3. Since the S2 and S3 have the same binary format on the ULP-FSM (in order words opcode_s2.py and opcode_s3.py would be identical in all ways except the peripheral register addresses accepted), I decided to keep having only the esp32s2 cpu for both the S2 and S3 and make this clear in the documentation. This also aligns with how Espressif does this in their binutils-gdb/esp32-ulp-as assembler, which only has cpu options esp32 and esp32s2 and they use the esp32s2 also for the S3.

That choice did lead me to something I am not 100% happy with, but thought it might be overall positive nonetheless, namely that opcode_s2 now accepts peripheral register addresses from any of the ESP32 variants. The reason is that what ends up in the actual processor instruction is a relative offset, relative to the base address of the specific variant. Thus within the instruction, it doesn't matter what the real input address was. And since especially across the S2 and S3 most of the relative offsets are actually the same, this "feature" allows assembly code with register addresses in the S2 range to work unmodified for the S3 (actually the binary would even be the same). You can see in the examples I added that I now only need 1 example for the s2/s3 combined rather than 2 examples.

I am not sure what you think of this. You took the alternative of adding a new cpu esp32s3. Perhaps this is the clearer approach? That way one doesn't need to even know that the S2 and S3 are binary compatible? On the other hand that feels useful knowledge to me, because one can reuse code and binaries.

Feel free to rebase onto what I just pushed. Note that commit 2aff8a1 contains an important fix for ST instructions with labels.


Small comment on your Micropython patch: That function is specific to your need and would not be good in MicroPython in general (e.g. it hard codes the direction to input, etc).

However, it might not even be needed, because so far I have always managed to do all that initialisation inside the ULP code, given that it also has access to the peripheral registers, which are configured by the rtc_gpio_* functions of the ESP-IDF. It requires a bit of digging in the ESP-IDF, to find which registers are set by those higher level function, but so far it worked.

What might not work (I am still busy testing), is enabling power of the RTC peripherals from ULP code (especially once deep_sleep has already been entered). So perhaps that is the "generic" function that would be useful in MicroPython, i.e. this part:

 // instruct to keep RTC peripherals powered on after when in deep sleep esp_sleep_pd_config(ESP_PD_DOMAIN_RTC_PERIPH, ESP_PD_OPTION_ON);
@ftylitak
Copy link

ftylitak commented Aug 8, 2023

@wnienhaus great comments.

Indeed our modification in the Micropython branch is too specific. Those parts of setting the direction of the input etc were added there in the beginning as testing because we were not getting any results (plus we did not have much knowledge on the ULP so it was a bit of testing here and there). In the end we have found the RTC peripheral power on, it all started working though we did not take a step back to use the ULP code for the rest.
Also, the initialization of the GPIO was a direct copy of the setup procedure of the ESP-IDF example that we took as reference for testing.
https://github.com/espressif/esp-idf/tree/master/examples/system/ulp/ulp_fsm/ulp

The interesting part is that the ESP-IDF example does not use esp_sleep_pd_config though it works. So, may be it is a setting of Micropython that affects the power of the RTC peripherals.

Regading to this project (micropython-esp32-ulp) we wil rebase to your changes.

@wnienhaus
Copy link
Collaborator Author

Thanks for that input. It should help me resolve my current problem: readgpio_s2.py doesnt work on either the S2 or S3 - GPIO input is always 0.

Very interesting that the ESP-IDF example does not use the esp_sleep_pd_config - I hadnt noticed yet. Because readgpio_s2.py runs without entering deepsleep, I didnt think esp_sleep_pd_config would make a different (i'd assume the RTC peripherals are on, while the entire device is on), but I was going to try that next.

So I will actually try your approach next (how you patched MicroPython), given that it works for you, and then see if that makes readgpio_s2.py work for me. Assuming that it will, I can then narrow down from there to figure out which of those rtc_gpio_* statements actually makes the difference, and then see whether doing the same from ULP code would also work.

@wnienhaus
Copy link
Collaborator Author

@ftylitak So, it turns out it's actually the "IO MUX clock gate" that needs to be enabled, which the ESP-IDF does as part of rtc_gpio_init for the S2 and S3 (not the original ESP32).

rtc_gpio_init calls rtcio_hal_function_select here, which is mapped to a variant-specific rtcio_ll_function_select implementation here, which for the s3 is this one, the IO MUX clock gate is enabled here, like this:

SENS.sar_peri_clk_gate_conf.iomux_clk_en = 1;

When I create a MicroPython function in esp32_ulp.c that executes that same 1 line (instead of the full rtc_gpio_init as in your (@ftylitak) patch) and I run that before starting my readgpio_s2.py ULP program, everything works (note I select mux RTC, input-enable and pulldown/pullups from within the ULP code, so that all works as intended from the ULP). Without running that 1 line, I cannot read GPIO input.

I tried doing the equivalent with WRITE_RTC_FIELD(SENS_SAR_PERI_CLK_GATE_CONF_REG, SENS_IOMUX_CLK_EN, 1) from within the ULP code (inspired by the ESP-IDF example here), but it doesn't have any effect. Maybe it has no effect in the ESP-IDF example either, because they already run rtc_gpio_init before starting the ULP code. But given that this example code exists, I wonder if we stumbled upon a bug in the chip. (Someone else also ran into this problem: see).

One "simple" fix could be to patch MicroPython to set this flag everytime esp32.ULP.run() is called, but I guess there is a reason this is off-by-default, and someone might want to keep it off in some applications (perhaps it saves some power).

The other, perhaps "more correct" option could be to simply expose the rtc_gpio_init function as e.g. esp32.ULP.rtc_gpio_init(pin_number) - without additionally setting direction, configuring pullups/pulldowns, hold, etc - as those things all can be done from within the ULP code.

Anyway, just reporting my findings so far. Will still think a bit more about this.

(@ftylitak - if you have some time, it would be interesting to know whether this single line is enough to make your use case work, also in deepsleep, because i have not tested with deepsleep yet.)

Peripheral registers of the ESP32-S2 and S3 are at a different location than those of the original ESP32. The location within the address space is mostly the same, however we need to use the correct base address when calculating periph_sel (type of register) used in REG_RD and REG_WR instructions. Note 1: To avoid creating an entirely new CPU (esp32s3) just for handling the different peripheral register addresses of the S3, while their binary format (instruction set) is identical, our esp32s2 CPU support will now accept both ESP32-S2 and ESP32-S3 addresses. This should make a binary for the one seamlessly work on the other (without reassembly), given that the offsets of different peripheral registers between the S2 and S3 are mostly (but not entirely) identical. Note 2: Our esp32s2 cpu support will also accept peripheral register addresses of the original ESP32. This was originally done because Espressif's binutils-gdb/esp32-ulp-as incorrectly validates addresses for the esp32s2 cpu, and to make our compat tests pass, this was needed. However this also has a nice side- effect of allowing some assembly written for the original ESP32 to work unmodified when assembled for an S2/S3, because some of the peripheral registers live at the same offset from the base for all three variants.
We also now use the correct include files from the ESP-IDF when building the defines DB, correct for the cpu type we're testing with. (That also means the defines DB is built once per cpu type). That, together with some ESP32-S2 specific test cases from Espressif's esp32s2 assembler test-suite, make those test cases more interesting to run, compared to only assembling ESP32 examples with the esp32s2 cpu selected. Note: This change no longer runs the ulp_tool examples for the esp32s2 case, because those examples use contants (from the ESP-IDF include files), which no longer exist for the ESP32-S2, such as `RTC_IO_TOUCH_PAD*_HOLD_S`. Since the ulp_tool examples primarily test the preprocessor's ability to resolve constants from include files (via the defines DB), testing those examples only once with the ESP32 cpu should be enough.
@wnienhaus
Copy link
Collaborator Author

So, I finally have made progress. It turns out the issue with failing to write to the SENS_SAR_PERI_CLK_GATE_CONF_REG register was due to a bug in our assembler, which already existed in our original ESP32 implementation. The bug was in how we translated peripheral register addresses (see commit which fixes it: 520a7f7)

This issue also once existed in the ESP-IDF, which is where we got the incorrect translation logic from. After asking Espressif (espressif/esp-idf#12158) about not being able to write to the SENS_SAR_PERI_CLK_GATE_CONF_REG register they pointed me to their fix (espressif/esp-idf#11652), which was exactly what we needed.

I have now added the fix to this PR and added working examples (all tested on a real device) to the examples/ directory. Those examples end in _s2 or _s3 (where no _s3 example exists the s2 example also works on the s3).

So we don't need any changes to MicroPython after all! I am happy.

cc: @ftylitak

@wnienhaus wnienhaus force-pushed the s2s3_support branch 2 times, most recently from 5be3f10 to bc2ee7e Compare August 30, 2023 19:35
The ESP32-S2/S3 support a negative offset in ST/LD instructions. Those offsets are two's-complement encoded into a field that is 11-bits wide. This change corrects the decoding of negative offsets given the field width of just 11-bits, rather than relying on the 32 or 64 bits of a MicroPython `int`. Note 1: Negative offsets used in JUMP instructions are encoded differently (sign bit + positive value), and their decoding is already done correctly. Note 2: The LD/ST instructions in of the ESP32 do not support negative offsets (according to Espressif tests), so their decoding remains as is.
This was already incorrect in the original ESP32 implementation but was discovered while testing the new S2/S3 implementation. This was also wrong within the ESP-IDF, that we based the translation logic on. Espressif fixed the issue in this pull request: espressif/esp-idf#11652 We now also have unit tests and compat (integration) tests, that compare our binary output against that of binutils-gdb/esp32-ulp-as, which already did this translation correctly, but we didnt have a test for the specific cases we handled incorrectly, so we didn't notice this bug. This fix has also been tested on a real device, because S2/S3 devices need the IOMUX clock enabled in order to be able to read GPIO input from the ULP, and enabling that clock required writing to a register in the SENS address range, which didnt work correctly before this fix.
There are now example files for the S2 and S3 (with ``_s2`` or ``_s3`` appended to their filenames). Note: The s2 examples also work unmodified on the ESP32-S3, except the readgpio example which needs different peripheral register addresses on the S3. The ``counter_s2.py`` example is unmodified compared to the original example, except that the assembler is told to generate esp32s2 output. The ``blink_s2.py``, ``readgpio_s2.py`` and ``readgpio_s3.py`` examples have their rtc_io base address updated, as well as the constants referring to the GPIO pins and channels and the peripheral register bits used to read/write the GPIO inputs/outputs. These addresses/bits have changed from the original ESP32. Otherwise the examples are identical to the examples for the original ESP32.
@wnienhaus
Copy link
Collaborator Author

Now that the examples work on real devices (S2 and S3) and because I have tested this quite extensively by now, I will proceed to merge this. Any further improvements can follow in future PRs.

@wnienhaus wnienhaus merged commit 25bf34e into micropython:master Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants