I hope it's allowed to ask here, but I am looking for a dataset (the format is not that important) that is similar to SQuAD, but it also contains false answers to the questions. I wanna use it to fine tune GPT-3, and all I find is either MC questions based on a text, but with no distractors, or classical quizzes that have no context before each question. I have a code that generates distractors, and I can just plug …
I'm fine-tuning pre-trained gpt-2 for text summarization. The dataset contains 'text' and 'reference summary'. So my question is how to add special tokens to get the right input format. Currently I'm thinking doing like this: example1 <BOS> text <SEP> reference summary <EOS> , example2 <BOS> text <SEP> reference summary <EOS> , ..... Is this correct? If so, a follow-up question would be whether the max-token-length(i.e. 1024 for gpt-2) means also the concatenate length of text and reference summary? Any comment …
I'm currently working on a project with the goal of producing AI content in the space of a content generation like blog writing, Instagram caption generation etc. Found the in-context few-shot learning capabilities of the GPT-3 quite useful but I'm unable to generate creative content consistently. It becomes boring and repetitive in nature after a few iterations. I came across the concept of knowledge probing of language models and have come to this understanding that writing better prompts can actually …
I've been experinmenting with Huggingface models and I've set up a chatbot with DialoGPT. It works pretty well, but after a while it stops answering and just returns empty strings. Before this it will start to give shorter and shorter answers. Any idea what can cause such a behavior? I'm using the medium-sized model with a max_length of 2000 and added a repetition_penalty=1.3, but other than that I didn't change any other parameters. I also add the previous message back …
I want to extract data from documents (native pdf's with English language) using GPT-J but without using it's API. I have searched all documentation regarding GPT-J but haven't came across anything related to this. This article mentions that searching data is possible using GPT-J but that's all it mentions. Basically I want to extract text from documents using GPT-J without using the API. Any help/links/articles/videos would be helpful! Thanks for your time and help!
I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error. The closest I got to an answer was this post, which still doesn't say what tokenizer it uses. If I knew what tokenizer the API used, then I could count how many tokens are in my prompt before I submit …
When exploring text generation using various large language models, I frequently come across generated text which presents facts which are plain out wrong. I am not talking about fake news or bias, rather I am talking about dated pieces of information which were once correct, but are no longer correct. When looking around for pros and cons of language models, I don't really see complaints about this as one of the greatest cons. When we finetune models, and with the …
In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks. In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks. I was guessing whats the difference. I know following difference between encoder and decoder blocks: GPT Decoder looks only at previously generated tokens and learns from them and not in right side tokens. BERT Encoder gives attention to tokens on both sides. …
It seems like a lot of noteworthy AI tools are being trained on datasets generated by web crawlers rather than human-edited, human-compiled corpora (Facebook Translate, GPT-3). In general, it sounds more ideal to have an automatic and universal way of generating a dataset. Is there any ubiquitous web crawler which does basically the same thing as Common Crawl but has a parameter for “language sought”? In other words, generate a web-crawled dataset in language X? (Background: I’d like to create …
Is there any pre-written library or function which can receive a few examples of data values being classified and then extend that to new data values received?
I would like to make use of a software function which can provide the definition of a word or phrase. These words and phrases are in the realm of common knowledge - objects like "DVD player", or specific places like "Canary Islands". I am pretty sure GPT-3 can do this because it's trained on the internet in general and Wikipedia, and it produces generally fluent language. However, I was curious if someone has already written this function and provided it …
Can someone share the derivation of Evidence Lower Bound in this paper ? Zero-Shot Text-to-Image Generation The overall procedure can be viewed as maximizing the evidence lower bound (ELB) (Kingma & Welling, 2013; Rezende et al., 2014) on the joint likelihood of the model distribution over images x, captions y, and the tokens z for the encoded RGB image. We model this distribution using the factorization ${p_\theta,_\psi(x, y, z) = p_\theta(x | y, z)p_\psi(y, z)}$, which yields the lower bound: …
On page 34 of OpenAI's GPT-3, there is a sentence demonstrating the limitation of objective function: Our current objective weights every token equally and lacks a notion of what is most important to predict and what is less important. I am not sure if I understand this correctly. In my understanding, the objective function is to maximize the log-likelihood of the token to predict given the current context, i.e., $\max L \sim \sum_{i} \log P(x_{i} | x_{<i})$. Although we aim …
I am working with a dataset that contains Questions on various Events conducted by a college and the corresponding answers for the queries. I am using this dataset to train a GPT-2 355M model to create a chatbot where users can get their queries answered. But i am not getting good results and i feel that's because the questions in the dataset are in the " -Query " format. For example, Ques: "Cicada3302 - Do I need to have any …
I've got a use case where I need to generate sentences based on a set of user supplied keywords. Here is an example of what I need: User input: End-User: Data Scientists Region: Middle East Country: UAE Solution: BigPanda Application: machine learning Benefits: lower costs and runtime Output (Curly-Brackets are just there to highlight): Learn how {data scientists} in the {Middle East} such as in the {UAE} are using {BigPanda} to streamline their {machine learning} processes to {lower costs and …
Given a list of sentences like this: 4 to 5 hours over a period of 16 weeks 1st session: 2.0-2.5 hours & 2nd session: 1.5-2.0 hours Approximately 5-6 visits over the course of 5 months. Visit 1, 3, 5: about 1.5 hours. Visit 2, 4: short 15 visits over a period of approximately 74 weeks. You will come to the organization about 12 times, over a period of a little more than three years. Each visit will take from 3-6 …
I am interested in accessing NLP models mentioned in scientific papers, to replicate some results and experiment. But I only see waiting lists https://openai.com/blog/openai-api/ and licenses granted in large commercial deals https://www.theverge.com/2020/9/22/21451283/microsoft-openai-gpt-3-exclusive-license-ai-language-research . How can a researcher not affiliated to a university or (large) tech company obtain access so to replicate experiments of scientific papers ? Which alternatives would you suggest to leverage on pre-trained data sets ?
I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better than GPT-3 in any particular area of NLP? It's quite interesting to note that OpenAI's GPT-3 is not open-sourced whereas tech behemoth Google's BERT is open-sourced. I felt OpenAI's stance and the hefty price tag for GPT-3 api is in stark contrast to its mission …