There are a few hundred time series of a large set of different locations (irregularly distributed) with the following properties: ordered factor (5 levels) between 5 and 25 observations per series lots of missing values within each series temporal and spatial autocorrelation (unknown) temporal frequency The objective is to spatially cluster the time series based on their similarity (of observed value per point in time). What would be adequate methods? The analysis will be carried out in R.
I have a dataset consisting of addresses (points) that have several attributes; one that distinguishes the "sort" of address and one attribute that contains a numerical value. I want to cluster these points based on: their distance from each other the sort of address However, the summed numerical attribute per cluster cannot exceed a certain threshold value. In other words, the system needs to form clusters but needs to stop clustering as soon as the sum of the numerical value …
I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this: In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code: import pandas as pd import numpy as np from sklearn …
I am trying to create a Choropleth map using the Chorolpeth widget from the Geo Add-on in Orange. However, this widget is not appearing? Any ideas? My current version: Orange version 3.24.1 for Windows
I have around one million labeled coordinates(latitude, longitude) all around the world, with around 10,000 unique labels(location_id). Each point corresponds to exactly one class(location_id). Each class is densely distributed over 1-10km. radius; With more density around its centroid. How can I create an earth multi-polygon consisting of 10,000 polygons? Basically dividing the earth into 10,000 polygons. The separation would be based on the density of points in each location. The more points clumped in a location, the bigger its polygon's …
I have a question regarding neural networks considering I am not an expert in NN. Assume have a 5 by 5 grid that depending on me pushing any square (or combination of squares) some of those squares (not necessarily the squares I have pushed) will light up. My question is: Can we set this problem as a NN problem if I have a set of input and outputs? Assume input layer with 25 neurons where all are zero except the …
To me it doesn't make sense to normalize it even if it is a numerical variable like Zip Code. An address should be interpreted as categorical features like "neighborhood"... ? Suppose I have geolocalisation data (latitude & longitude), the best thing to do seem to use k-means clustering and then working with cluster's label that I "encode". If the answer is : "it depends" please tell me how
I need to construct an interactive clustering plot. Ideally as the user zooms in the clusters would split-up into smaller clusters at certain zoom levels. I am planning to have several discrete levels of clustering, and the plot would visualise each depending on the zoom-level the user is at. I'm not that sure how to approach this. Are there any python packages that can help? Any advice appreciated.
Is there a trade-off in accuracy/generalisation/performance when providing priors to a general machine learning algorithm vs training the machine learning algorithm with enough data so that it could internalise that prior? For example: Let's say I'm trying to get an ANN to do some basic classification on whether Vehicle 'A' is in class 'Bus' or 'Not Bus'. Vehicles, in this example, have some features that are dependent on each other [Size,Speed] and let's say that I have a history of …
I have spatial data from multiple sources. This data consists of ID, lat, long, and time. My goal is that given a new lat-long, the model needs to return (preferably with a probability) the data points that match the new lat-long. This matching should be based on the features (such as lat, long, timestamp). I could only think of clustering. ie. Cluster the dataset and try to predict which cluster the new data belongs to. The drawback is that if …
I’m curious to know if pre-trained transformers could handle search queries that include numerical data or make references to spatial relationships. Take an example dataset of a list of restaurants, each with a distance relative to the city centre. Would transformers be able to handle: ”Which restaurant is closest to city centre?” ”Which restaurant is 2km from the city centre?” Curious to know if anyone has any opinions on this or has seen any articles/examples covering this aspect of searching. …
What's the current methodology for clustering geospatial data by features? Example: I have some demographic dataset. Let's say this contains average home price and population density. So, an example correlation here would be home price vs population density. But, the trick is how the clustering gets pulled. For example, an affluent area with high population density isn't the same as one with low population density. Applying a basic distance metric wouldn't take this into account since low vs highs could …
I have a dataset of existing 156 branches of my company with the longitude and latitude of each branch. Now we want to open 10 more branches. How can we predict the best locations for opening the new branches using machine learning? As much as I have searched on google, I have gone through by using ARCGIS geo-spatial data or using openstreetmap but couldn't find all the steps properly. Therefore, Kindly guide me as I am a junior data scientist …
I am trying to train a model to predict the location of a storm at a given time. The dataset includes the longitude and latitude of the storm at the given "timestamps" but I am not sure if that is the best way to represent the location as it doesn't likely have a linear relationship. Is there a way to combine the longitude and latitude into a feature that can be used for training? I was thinking about creating "grids" …
Has anyone has success with building models using KMeans for classification? I have images that only have one band and it continues to fail. My guess is that the issue is with both size of the image as well as the single band. For example: from osgeo import gdal,gdal_array import numpy as np src = '/Path/ImgA.TIF' img_A = gdal.Open(src) #Getting bands (count) bands_n = img_A.RasterCount #returns 1 band = img_A.GetRasterBand(1) #read as array band_arr = band.ReadAsArray() band_sh = band.shape #eg. …
I have ~10000 irregularly placed nodes in x,y space. Each node must be covered by a station. A node is defined as covered by a station if it is within 1000m of the station. Each station can only cover up to 4 nodes, regardless of how many nodes fall within 1000m of the station. I want to optimize the location of the stations such that the total number of stations is minimized with the constraint that every node must be …
Tableau has interactive maps that can be used for custom geocoding. I know companies often divide regions for sales and marketing purposes but could anyone think of any specific use cases or examples where custom geocoding could be used? For example, we could further divide a city into different areas? But what else?
I have a history of the car's movements, a list of GPS coordinates with timestamp (in GPX format). I'm new to ML, tried to solve but doesn't work well. I have several problems: How to correctly normalize data. The duration of the unloading process may take different times, so the number of points will be different. The nature of the unloading may look different, but they are united by one thing, during unloading the machine either stands still or slowly …
A few versions back Orange Geo would show points as cluster areas (circles) or polygons bounding the cluster points. Does anyone know which Orange version this was? I vaguely remember Versions 3.22-3.26 but I might be wrong. The older point cluster version is far better for publishing then the current color shaded version. If it was versions 3.22-3.26 I am having a problem loading Geo that has that option as I have tried 3.22, 3.24, and 3.26 and see no …