When it comes to picking a language for a certain application, many factors may play a role:
- Who will be building/maintaining the application and are they familiar with that language?
- What other systems will the application need to communicate with and is one language easier to do that with than the other?
- Is this project experimental or is it supposed to be a cornerstone for the next few years?
- Are there existing implementations/frameworks of what needs to be done in that language? Does it fit the need or does it need to be altered?
- Etc.
If we look at your list, here is my take:
ETL is quite a language agnostic, both Java and Python have good bindings to tools necessary for ETL.
Similar to ETL, although Python may be easier to do simple things.
- Making use of pre-trained models / transfer learning
Python is much more convenient as most existing frameworks and pre-trained models are compatible with Python, and likely not with Java.
The Hadoop stack is very Java friendly. Good to note that Python is increasingly become easier to use with Hadoop products (but Java is still running underneath)
- Processing large volume of data (images,time series data,sensor data..etc)
Although Python has a reputation of being slow, if you know what you're doing, most Python libraries you'll use will ultimately run C/C++ code anyway, so speed shouldn't be a huge concern.
- Production implementation (batch or real time / online)
Java is statically typed which offers a lot of advantages for production, it's becoming steadily easier to do this with Python as well (i.e. pydantic, mypy, etc.)
- Post production training (bacth or real time / online)
This is independent of the language, this has more to do with your MLOps and your pipelines in place.
Java definitely has the edge, although in a world of microservices, Python can prove very helpful.
- IoT edge device computing
When it comes to device computing, Java probably has the edge, although, C/C++ are likely your best allies.