Android provides you with two approaches to use on device Machine Learning in your apps.
1)ML Kit
2)Android’s Custom ML stack
In terms of advantages latency is greatly reduced when you use on-device ML. Other benefits include data privacy and offline availability.
Using ML Kit
This enables turnkey ML features in your app and you could use ML Kit APIs even if you have no experience with Machine Learning.
As announced at Google IO 2023, with the latest iteration of ML Kit there’s been significant improvement in model performance across various APIs. Notable among them are the Barcode Scanner, Language recognition and Text recognition APIs. Text recognition is now also available in 5 scripts including Devanagari, Japanese, Chinese and Korean as well when it comes to text recognition. These scripts cover over a 100 languages. Symbol recognition is available now at the character level and this makes sense in languages like Chinese where a character can be a word.
With the Barcode Scanner APIs Android now provides a UI flow which enables faster and easier development and performance has also increased by 17%. The information returned is limited to only what is required from the Barcode and since you are not processing the image the app no longer needs to obtain camera permissions.
All these ML kit APIs are available via Google Play services and that in turn reduces the size of the Android APK considerably.
Many ML Kit APIs are based on TensorFlow Lite.
A document scanning API is under development and will be available later this year. Since the user decides what images to send back to the app, camera permissions are not needed and this improves security.
Android Custom ML stack
If you would like to train your own ML model and would like to control the inference process, then Android provides a custom ML stack built on top of TensorFlow Lite APIs which can be used as the inference engine. Once again these APIs are provided through Google Play services and thus reduce the size of your APK.
Hardware Acceleration is a key factor when it comes to running models and can improve inference performance. ML inferences strongly benefit from running on specialized chips. Significant performance improvements are seen when GPUs or TPUs are used freeing up your CPU for other operations. According to Google a Pose detection Machine Learning model saw a 200% improvement in performance when it used TPUs on the Pixel 6 device.
TensorFlow delegates are the best way to implement hardware acceleration in your app. Check the references below for more information on using delegates in your app.
Finding the ideal hardware acceleration configuration for a device can be tricky and to find the optimal Hardware Acceleration configuration, the Acceleration Service APIs have been introduced for CPUs and GPUs initially with additional support in the works. These APIs use your model to run a benchmark test and recommend the optimal hardware configuration for your device that you can apply to your TensorFlow Lite interpreter. The Acceleration Service APIs are built on top of TensorFlow Lite and are being offered via Google Play Services and thus do not impact the size of the APK.
For further information check out d.android.com/ml
References
https://www.tensorflow.org/lite/android/play_services
https://www.tensorflow.org/lite/performance/delegates
https://www.tensorflow.org/lite/android/acceleration_service