I'm trying to migrate this code, "Omniglot Character Set Classification Using Prototypical Network", into Tensorflow 2.1.0 and Keras 2.3.1. My problem is about how to use euclidean distance between train data and validation data. Look at this code: def convolution_block(inputs, out_channels, name='conv'): conv = tf.layers.conv2d(inputs, out_channels, kernel_size=3, padding='SAME') conv = tf.contrib.layers.batch_norm(conv, updates_collections=None, decay=0.99, scale=True, center=True) conv = tf.nn.relu(conv) conv = tf.contrib.layers.max_pool2d(conv, 2) return conv def get_embeddings(support_set, h_dim, z_dim, reuse=False): net = convolution_block(support_set, h_dim) net = convolution_block(net, h_dim) net = convolution_block(net, …
I have to find the closest match between my image and bunch of already collected images of different classes in the folder. Whic meta-learning approach should I select. I am thinking about the Siamese or matching network. In Siamese, I have to match my image with all existing images in the folder to find the correct match. So do you think if I can use a matching network and produce a better result? What is the parameter based on which …
what about difference between the meta learning and semi-supervised learning and self-supervised learning and active learning and federated learning and few-shot learning? in application and in definition? Pros and cons?
I'm studying ensemble learning methods, focus on random forest and gradient boost. I read this article about this topic and this about meta learning. It is possible to say that ensemble learning is a subset of meta learning?
While reading some works on meta-learning. I had this doubt. Can we consider meta-features of a dataset as it's embedding ? Given the meta-feature is a lower dimensional representation which also try to retain properties of a dataset. Embeddings are essentially low dimension representation of some high dimensional concept. Is it fair to use "embeddings" instead of "meta-features" ? or can we use "representation" instead of "meta-features"
I am working on few shot learning and I wanted to use efficient-net as backbone feature extractor. Most of the model use Resnet as feature extractor. For example I can use below line of code and it extract features for me - from model.res50 import ResNet self.encoder = ResNet() self.fc = nn.Linear(hidden_dim, num_classes) def forward(self, data): out = self.encoder(data) out = self.fc(out) return out I am using this PyTorch implemetation of Efficientnet- EfficientNet-PyTorch. I am not sure how to use …
I have a task that I am not able to identify if it is of transfer or meta learning. I want to know this, in order to ask help in solving it, because there are some parts that I have not understood. The task is the following: We want that our neural network learns to sum weights. 1)Do the training on MNIST, and on CIFAR10 (as support dataset). We want that performance (accuracy) on MNIST, before and after the sum, …
When implementing stacking for model building and prediction (For example using sklearn's StackingRegressor function) what is the appropriate choice of models for the base models and final meta model? Should weak/linear models be used as the base models and an ensemble model as the final meta model (For example: Lasso, Ridge and ElasticNet as base models, and XGBoost as a meta model). Or should non-linear/ensemble models be used as base models and linear regression as the final meta model (For …
I've just started to learn Meta Learning reading the book Hands-On Meta Learning with Python. I think I know the answer for my question, but I'm a little confuse about how to implement the algorithm with Keras. This piece of code is from an example that uses U-NET: from sklearn.model_selection import train_test_split # Split train and valid X_train, X_valid, y_train, y_valid = train_test_split(train_data, test_data, test_size=0.1, random_state=42) results = model.fit(X_train, y_train, batch_size=32, epochs=50,\ validation_data=(X_valid, y_valid)) My problem is with the fit …
I was wondering whether somebody could explain how to optimize hyperparameters for the base learners and meta algorithm when stacking? In many tutorials they seem to be plucked out of thin air! Thanks, Jack
I am using a library called MFE to generate meta-features. However, I am working right now with several files and I notice that I am using only 1 core of my machine and taking too much time. I have been trying to implement some libraries as I saw in another question: library(iterators) library(foreach) library(doParallel) This one, but me being dumb could not implement it ='(. I just would like to put this snippet running in all my cores so I …
What resources do you use to learn meta knowledge ? By meta knowledge, I mean generalized information that will help us take more informed decisions when working on a problem later. Example of meta knowledge: Lots of time series data ? Build a CNN Limited in time and want to get quick insights from a dataset ? Try Random forests. Continuous data and supervised learning ? Do a regression. I thought people would have build quiz-like applications to help determine …
In the paper "Learning to learn by gradient descent by gradient descent" they describe an RNN which learns gradient transformation to learn an optimizer. The optimizer network directly interacts with the environment to take actions, $\theta_{t+1} = \theta_t + g_t(∇f(\theta_t), \phi).$ (Equation 1 from the paper) and hence feels like a reinforcement learning problem in continuous action space. The formulation of optimization equation looks like what one would typical do in a supervised learning problem, $L(\phi) = E_f[f(\theta^*(f,\phi))]$ (Equation 2 …
I'm trying to find an optimal dithering pattern which can be used as a threshold on a greyscale image to generate a 1 bit black and white image. Ideally it would be optimal in the sense that a human would judge it perceptually most close to the source image. For instance, dithering with white noise looks pretty bad: While dithering with blue noise looks a lot better, despite having the same amount of error: (images from https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/) While white noise …
In this paper by Deep-Mind on one shot learning they have published an architecture explaining how the system works with an external meory. I understand the mechanism perfectly. But what I don't understand is how they feed the data in to the algorithm. In this part of the paper they explain as follows , From this part I actually don't understand is do do the concatenate inputs with labels ? And what is label shuffling ?
I was thinking about this lately. Let's say that we have a very complex space, which makes it hard to learn a classifier that can efficiently split it. But what if this very complex space is actually made up of a bunch of "simple" subspaces. By simple, I mean that it would be easier to learn a classifier for that subspace. In this situation, would clustering my data first, in other words finding these subspaces, help me learn a better …