Developing a deep learning hybrid architecture for a particular problem is a highly complicated task
I am currently conducting research on application of deep learning (sensor signal recognition). I spent about a year and a half sifting through the literature and discovered some research patterns. To begin, I noted the emergence of Convolutional Networks (CNNs). Individuals applied CNN to their problems and reported state-of-the-art outcomes. Then LSTM was proposed; it was quickly adopted and declared state-of-the-art. Then the trend shifted; people began to use hybrid architectures and reported cutting-edge results. The current trend is to focus on transformers and simple attention everywhere, plus work with hybrid deep learning architectures and publish their work with new baseline results.
In my experience, proposing a hybrid architecture to outperform the current state-of-the-art result is quite difficult. Specifically, I am puzzled about several issues:
1- At this point, I feel compelled to investigate a different domain, such as computer vision, and observe how specific architectures were applied. Because it is somewhat related to my domain, and because deep learning was initially developed for the natural language or computer vision domains, people began to consider how they could apply these new deep learning architectures to their domain and solve problems. So , I decided to look at other domains as well. Perhaps I'll discover a more effective approach that I can incorporate into my domain's current base architecture to improve the results. To begin, I fully understand the research paper and then consider how I might apply the xyz approach to my scenario. The issue is that even in 2021, hundreds of architectures (using transformers and attentions) are published. It is not trivial to comprehend other domain papers. Thus, the question is how to narrow down these papers and decide on which architecture, layer, or module to try. 2- Consider the case where you obtain an architecture based on intuition. You are about to conduct the experiments. It is nearly impossible to run the deep learning code once and achieve acceptable results. This is almost certainly going to require a lot of luck. At the very least, to validate your architecture and determine whether it can produce better results than the state-of-the-art, you should run your code at least 200 times and up to 3000 or 4000 times. Then, you can select the best result from all iterations and compare it to the baseline. Otherwise, you will need to tweak your code slightly and re-run the experiment. It is a lengthy process, and there is a 70% chance that you will not achieve the best results possible using the then-current baseline.
3- I wrote every line of code from scratch. However, I am not 100% sure that I implemented the exact idea that I had in mind. Due to the fact that we use multiple libraries to implement deep learning architectures, we cannot be 100% sure that either I made a mistake in my implementation code or that I have a problem with my proposed method.
So can we have a better way to at least minimize these problems as much as possible?
Topic transformer computer-vision research deep-learning time-series
Category Data Science