[SERVE][CPP][Android] add native executable program to benchmark models #2987
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Hello,
I have modified and crafted some code to run LLM in adb shell or linux shell via MLC-LLM (btw. great appreciate to authors and contributors) as a binary executable program.
I'm not an expert in C++, so the code isn't perfect(actually it is tinkered and glued outputs of ChatGPT, Claude and my dog), but I think it's easy to read, understand and run.
How to setup:
0. setup MLC-LLM and virtualenv (install dependencies, TVM, etc. etc.)
build-aarch64-opencl. Run all following commands from this dir.cmake \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_TOOLCHAIN_FILE=/home/piotr/android/sdk/ndk/26.1.10909125/build/cmake/android.toolchain.cmake \ -DCMAKE_INSTALL_PREFIX=. \ -DCMAKE_CXX_FLAGS="-O3" \ -DANDROID_ABI=arm64-v8a \ -DANDROID_NATIVE_API_LEVEL=android-31 \ -DANDROID_PLATFORM=android-31 \ -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON \ -DANDROID_STL=c++_static \ -DUSE_HEXAGON_SDK=OFF \ -DMLC_LLM_INSTALL_STATIC_LIB=ON \ -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \ -DUSE_OPENCL=ON \ -DUSE_OPENCL_ENABLE_HOST_PTR=ON \ -DUSE_CUSTOM_LOGGING=OFF \ ..make -j 8. Now you should havelibmlc_llm_module.so,tvm/libtvm.soandllm_benchmark.adb shellrun following commands:4- means OpenCL (alternatives described in sourcecode); 5th - timeout in seconds of executation; 6th - max tokens; 7th - prompt; 8th - number of executions (in case of 1, it will print generated text).