There are some storage space type like GigaSpaces. that inspired from tuple space. Would you say mongodb can act like a tuple space or not? What is the difference between mongo and tuple space?
Im Working With Mongodb And Wanted to do a query using Aggregate fucntion. Query Is Each city has several zip codes. Find the city in each state with the most number of zip codes and rank those cities along with the states using the city populations. The documents are in the following format { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] } I was able to count no of Zipcodes for each state …
I have a large dataset with 9m JSON objects at ~300 bytes each. They are posts from a link aggregator: basically links (a URL, title and author id) and comments (text and author ID) + metadata. They could very well be relational records in a table, except for the fact that they have one array field with IDs pointing to child records. What implementation looks more solid? JSON objects on a PostgreSQL database (just one large table with one column, …
I'm writing to get advices about my project. I want to make recommander system for shop with some products. In fact i want to recommand to shop A to take item X because shop B sell this item and shops A and B are very similar. The "problem" here is the size of the data : i have around 5TB of raw data (about 8 000 000 000 lines) So it's very difficult to do something with huge data like …
We are currently developing a system with MEAN stack with Mongodb at backend. We have employees name, and Ids in our system and our client wants to get pretty good (Read: Google Like) search in our system to search for employees' records. He needs our system to recommend employees even if he has misspelled the name, etc. One of the suggestions from our development lead was that we should use elastic search but from what I have seen, elastic search …
I am currently storing data crawled from multiple websites having same but still different structure so every crawler is saving data in separate csv. I am planning to store the data using MongoDB instead of storing it in csv. Will this be beneficial in saving space ? Overall will this be advantageous to do or will there be any drawbacks apart from me having to change the code ?
Problem description I have a data set about 10000 patients in a study. For each patient, I have a list of various measurements. Some information is scalar data (e.g. age), some information is time series of measurements, some other information can be even a bitmap. The individual record itself can be quite thick (10kB to 10MB). The data is to be processed practically in two steps: Preprocessing at the level of individual records (patients), i.e. to extract some features in …
I'd like to perform NLP analysis on Wikileaks US Diplomatic Cable Leaks documents (https://wikileaks.org/plusd/), preferably as Python's NLTK3 corpus od Mongo DB documents. I couldn't find any option for download these in any raw text format, so I'm afraid I'm forced to apply some kind of scraping I guess, but I'd be thankful if anyone would give a clue for some simpler solution, if exists any.
I am planning to set up a JSON storage system. It will store tens of millions of JSON records, all in the same format. I'd like to be able to query the data using Apache Drill. It looks like there is Drill support for MongoDB and Postgres. However, I'm unsure of the pros and cons of each, and how I'd structure the schema if I'd choose Postgres.
I am working with an organization to analyse their data residing in Mongodb and to look for any trends/patterns in the data. I am quite new to the professional field of Data Analysis but have a good background of Statistics and Data Mining (University coursework). I will be doing a proof of concept on the data to understand if the data the organization is gathering is good for Analytics and if no what enhancements should they include in their datasets …
I have currently been tasked with designing an application that tracks several different measurements around the office, eg. the temperature, light, presence of people, etc. Having never really worked on data analysis before, I would like some guidance on how to store this data (which database design to use). What we're looking at currently are around 50 sensors that only send data when an event of interest occurs: if the temperature changes by 0.5 degrees or if the light turns …
I have a 7 giga confidential dataset which I want to use for a machine learning application. I tried : Every package recommanded for efficient dataset management in R like : data.table, ff and sqldf with no success. Data.table needs to load all the data in the memory from what I read, so it's obvious that it will not work since my computer has only 4g RAM. Ff leads to a memory error too. So I decided to turn to …
In our company, we have a MongoDB database containing a lot of unstructured data, on which we need to run map-reduce algorithms to generate reports and other analyses. We have two approaches to select from for implementing the required analyses: One approach is to extract the data from MongoDB to a Hadoop cluster and do the analysis completely in Hadoop platform. However, this requires considerable investment on preparing the platform (software and hardware) and educating the team to work with …