1. 15 Oct, 2021 2 commits
  2. 14 Oct, 2021 4 commits
  3. 13 Oct, 2021 2 commits
  4. 24 Sep, 2021 6 commits
  5. 21 Sep, 2021 9 commits
  6. 19 Sep, 2021 9 commits
  7. 16 Sep, 2021 8 commits
    • Damien George's avatar
      tools/ci.sh: Use IDF v4.4 as part of esp32 CI and build GENERIC_S3. · da4593f9
      Damien George authored
      IDF v4.4 does not have an official release so for now use the latest
      master.  Also remove building GENERIC with no options (all the other boards
      are no-option builds), to keep CI time reasonable.
      Signed-off-by: default avatarDamien George <damien@micropython.org>
      da4593f9
    • Damien George's avatar
      esp32/boards: Add new GENERIC_S3 board definition. · 80fe2568
      Damien George authored
      Thanks to Seon Rozenblum aka @UnexpectedMaker for the work.
      Signed-off-by: default avatarDamien George <damien@micropython.org>
      80fe2568
    • Damien George's avatar
      esp32: Add support for ESP32-S3 SoCs. · 54d33b26
      Damien George authored
      Thanks to Seon Rozenblum aka @UnexpectedMaker for the work.
      Signed-off-by: default avatarDamien George <damien@micropython.org>
      54d33b26
    • Jim Mussared's avatar
      all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE. · b326edf6
      Jim Mussared authored
      This commit removes all parts of code associated with the existing
      MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
      -mcache-lookup-bc option to mpy-cross.
      
      This feature originally provided a significant performance boost for Unix,
      but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
      added significant extra complexity to generating and distributing .mpy
      files.
      
      The equivalent performance gain is now provided by the combination of
      MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
      been enabled on the unix port in the previous commit).
      
      It's hard to provide precise performance numbers, but tests have been run
      on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
      xtensa) and they all generally agree on the qualitative improvements seen
      by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
      MICROPY_OPT_MAP_LOOKUP_CACHE.
      
      For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
      change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
      with MAP_LOOKUP_CACHE is:
      
      diff of scores (higher is better)
      N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
      bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
      bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
      bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
      bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
      bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
      bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
      bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
      misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
      misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
      misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
      misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)
      
      This shows that the new optimisations are at least as good as the existing
      inline-bytecode-caching, and are sometimes much better (because the new
      ones apply caching to a wider variety of map lookups).
      
      The new optimisations can also benefit code generated by the native
      emitter, because they apply to the runtime rather than the generated code.
      The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
      MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
      
      diff of scores (higher is better)
      N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
      bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
      bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
      bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
      bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
      bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
      bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
      bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
      misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
      misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
      misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
      misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)
      
      In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
      MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
      - are simpler;
      - take less code size;
      - are faster (generally);
      - work with code generated by the native emitter;
      - can be used on embedded targets with a small and constant RAM overhead;
      - allow the same .mpy bytecode to run on all targets.
      
      See #7680 for further discussion.  And see also #7653 for a discussion
      about simplifying mpy-cross options.
      Signed-off-by: default avatarJim Mussared <jim.mussared@gmail.com>
      b326edf6
    • Jim Mussared's avatar
      unix: Enable LOAD_ATTR fast path, and map lookup caching. · 60c6d559
      Jim Mussared authored
      Enabled for all variants except minimal.
      Signed-off-by: default avatarJim Mussared <jim.mussared@gmail.com>
      60c6d559
    • Jim Mussared's avatar
    • Jim Mussared's avatar
      py/map: Add an optional cache of (map+index) to speed up map lookups. · 11ef8f22
      Jim Mussared authored
      The existing inline bytecode caching optimisation, selected by
      MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, reserves an extra byte in the
      bytecode after certain opcodes, which at runtime stores a map index of the
      likely location of this field when looking up the qstr.  This scheme is
      incompatible with bytecode-in-ROM, and doesn't work with native generated
      code.  It also stores bytecode in .mpy files which is of a different format
      to when the feature is disabled, making generation of .mpy files more
      complex.
      
      This commit provides an alternative optimisation via an approach that adds
      a global cache for map offsets, then all mp_map_lookup operations use it.
      It's less precise than bytecode caching, but allows the cache to be
      independent and external to the bytecode that is executing.  It also works
      for the native emitter and adds a similar performance boost on top of the
      gain already provided by the native emitter.
      Signed-off-by: default avatarJim Mussared <jim.mussared@gmail.com>
      11ef8f22
    • Jim Mussared's avatar
      py/vm: Add a fast path for LOAD_ATTR on instance types. · 7b89ad8d
      Jim Mussared authored
      When the LOAD_ATTR opcode is executed there are quite a few different cases
      that have to be handled, but the common case is accessing a member on an
      instance type.  Typically, built-in types provide methods which is why this
      is common.
      
      Fortunately, for this specific case, if the member is found in the member
      map then there's no further processing.
      
      This optimisation does a relatively cheap check (type is instance) and then
      forwards directly to the member map lookup, falling back to the regular
      path if necessary.
      Signed-off-by: default avatarJim Mussared <jim.mussared@gmail.com>
      7b89ad8d