A Beginner’s Guide to GPT Neo (With Python Codes) –


Ever thought of writing a code that may code for you?! Or generate contextualised textual content on the topic you need ?! Well, the answer to all of those use circumstances got by OpenAI, which is a big scale organisation thought-about by many to be main the world in Artificial Intelligence,   after they launched the enduring GPT paper which is called ‘Language Models are Few Shot Learners’ ( Generative Pre-trained Transformer) in June 2018.

Become A Chartered Data Scientist™
Achieve the very best distinction within the information science occupation

Afterwards, in upcoming years, OpenAI launched GPT-2 and GPT-3 as properly. 

Generative Pre-trained Transformer or briefly GPT is a transformer-based mannequin structure which is nothing however stacks of encoders and decoders put one after the opposite, of which has been pre-trained on Wikipedia Corpus (wow significantly ? like all the pieces on Wikipedia ?!) in addition to Common Crawl (Fun truth – this has over 12 PetaBytes of knowledge which is 12 years of knowledge uploaded on the web ) datasets for performing extraordinarily properly on language-based use circumstances. Generative, because the phrase suggests, is for making our code generate textual content. Now it may be poems, articles, essays and even code!!

According to VentureBeat,” a non-public corpus of 500 billion tokens was used for coaching the mannequin and a computational price of a staggering 50 million USD”. 

The newest GPT-Three has over 175 BILLION parameters! As mentioned by Hugo Cen from Entreprenuer.com, and I’m quoting, “This is the Most Powerful Artificial Intelligence Tool in the World ”, and I’m assured most of us consider that too! However, there may be one downside that 

GPT-3 is just accessible through a beta API, which is presently on maintain and for that, you must write an software to OpenAI. Crazy proper?

What if you wish to leverage the ability of GPT-Three however don’t need the effort of going by the appliance course of and so forth? Introducing GPT-Neo, an open-source Transformer mannequin with solely 2.7 Billion parameters, additionally notes that the most important GPT Neo is nearly equal to the smallest GPT-3, which resembles GPT-3 each when it comes to design and efficiency. 

When evaluating GPT-Neo with GPT-3 Ada (smaller model of GPT-3), the previous did higher than the latter on Hellaswag and Piqa. Hellaswag is a benchmark with clever multichoice sentence completion that has a context paragraph and 4 endings. Piqa can measure widespread sense reasoning the place the machine has to choose one out of two sentences that take advantage of sense. However, GPT-3 Ada isn’t the most important as talked about earlier; it’s huge brother GPT-3 Davinci with about 65 instances as many params as GPT-Neo, Davinci beat Neo comfortably. Yepp, nothing a lot surprising.  

You can practice this mannequin from scratch utilizing a mesh-TensorFlow library, an excellent library for simple and environment friendly information and mannequin parallelism to assist with distributed help. These fashions have tons of knowledge to coach on and plenty of parameters; therefore parallelism is important right here. This signifies that you’ll be operating totally different segments of your coaching concurrently fairly than doing it one after one other. This is totally impartial of various batches. Google Research has supplied a easy template in addition to implementation on this notebook. Ensure to undergo the readme file for directions on how one can proceed; code for this pocket book is supplied under with steps.

  1. Cloning the GitHub Repository of GPT-Neo by Setup cell, ensure you have TPU runtime if not, go to Runtime -> Change Runtime -> TPU. 
  1. Setting up Google Cloud as TPUs can not learn from native techniques; therefore the under cell would require your authentication credentials for those who don’t have a Google Cloud Platform account, no worries! You could make an account at no cost and get credit price of 300 USD free for a span of 90 days. Else you possibly can comply with the pocket book the way it goes!

    The under command will take you thru the configuration of gcloud.

 from google.colab import auth
 !gcloud init 

Setup a brand new configuration with any title you want and proceed together with your Google Account with which you might have logged in GCP. Create a challenge title and ensure you comply with the rules as it will trigger errors, and also you’ll need to run the entire cell once more.

  1. You are able to go whenever you get affirmation of Google SDK configuration and are prepared to make use of. 

Now we’ve to arrange the datasets (the record is current within the pocket book ), tokenize them, and replica it to the bucket (which is a storage for a specific challenge), which can be made in your GCP.

     # Tokenize Data
 !python information/create_tfrecords.py --input_dir /content material/GPTNeo/$dataset_path --name $dataset_name --files_per 1000 --output_dir $out_name --write_dataset_config --processes 1
 # copy the information to your bucket
 if not path_to_cloud_bucket.endswith('/'):
        path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket + "datasets/" + dataset
 !gsutil -m cp -r /content material/GPTNeo/$out_name $copy_loc
 !gsutil ls $path_to_cloud_bucket 
  1. Before beginning the coaching, modifying to the dataset is required, and mannequin configurations to level to your bucket created in GCP. For this, you must change the ‘path’ subject and alter the given dataset’s title to your chosen dataset.
     %%writefile configs/dataset_configs/Sampling_Only.json
   "path":   "gs://eleutherai/datasets/Sampling_Only/Sampling_Only*.tfrecords",
   "eval_path": "",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257
  1. Setting up the mannequin configurations, for an in depth breakdown, ensure that to comply with here; this can be a GitHub README file supplied by EleutherAI, which had made GPT-Neo and open-sourced it.
     %%writefile configs/GPT3_XL.json
     "n_head": 16,
     "n_vocab": 50257,
     "embed_dropout": 0,
     "lr": 0.0002,
     "lr_decay": "cosine",
     "warmup_steps": 3000,
     "beta1": 0.9,
     "beta2": 0.95,
     "epsilon": 1e-8,
     "opt_name": "adam",
     "weight_decay": 0,
     "train_batch_size": 256,
     "attn_dropout": 0,
     "train_steps": 600000,
     "eval_steps": 0,
     "predict_steps": 1,
     "res_dropout": 0,
     "eval_batch_size": 4,
     "predict_batch_size": 1,
     "iterations": 100,
     "n_embd": 2048,
     "datasets": [["pile", null, null, null]],
     "model": "GPT",
     "model_path": "gs://eleutherai/GPT3_XL",
     "n_ctx": 2048,
     "n_layer": 24,
     "scale_by_depth": true,
     "scale_by_in": false,
     "attention_types" :  [[["global", "local"],12]],
     "mesh_shape": "x:4,y:2",
     "layout": "intermediate_expanded:x,heads:x,vocab:n_vocab,memory_length:y,embd:y",
     "activation_function": "gelu",
     "recompute_grad": true,
     "gradient_clipping": 1.0,
     "tokens_per_mb_per_replica": 2048,
     "precision": "bfloat16"

7. Finally, we will practice the mannequin from scratch utilizing the next command.

!python3 essential.py --model colab_XL --steps_per_checkpoint 500 --tpu colab

8. Upload the mannequin to your bucket as proven under

 # add to your bucket
 bucket_base = "gs://" + path_to_cloud_bucket.substitute('gs://', '').cut up('/')[0]
 !gsutil -m cp -r $path_to_local_weights $bucket_base 

9. If all the pieces labored out, you could possibly see your mannequin listed under

!gsutil ls $bucket_base

10. For analysis, the pocket book has used a wikitext dataset and to leverage that 

 wikitext103_src = "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip"
 !wget $wikitext103_src
 !unzip wikitext-103-raw-v1.zip 

11. This step will make a listing, tokenize the textual content as required and replica it to the bucket.

 !mkdir wikitext
 !mv /content material/GPTNeo/wikitext-103-raw/wiki.check.uncooked wikitext/wikitext_test.txt
 # Tokenize Data
 !python information/create_tfrecords.py --input_dir wikitext --name wikitext --files_per 1000 --output_dir wikitext_tokenized --write_dataset_config --processes 1 --wikitext-detokenize
 # copy the information to your bucket
 if not path_to_cloud_bucket.endswith('/'):
   path_to_cloud_bucket += '/'
 copy_loc = path_to_cloud_bucket 
 !gsutil -m cp -r wikitext_tokenized $copy_loc
 !gsutil ls $path_to_cloud_bucket 

12. Repeating step of organising dataset configuration.

 %%writefile configs/dataset_configs/wikitext.json
   "path": "",
   "eval_path": "gs://test-bucket-neo/wikitext_tokenized/*.tfrecords",
   "n_vocab": 50256,
   "tokenizer_is_pretrained": true,
   "tokenizer_path": "gpt2",
   "eos_id": 50256,
   "padding_id": 50257

13. Running the mannequin for analysis over the tokenized textual content.

!python3 essential.py --eval --tpu colab --model $pretrained_model

This was a whole breakdown of all of the steps required to coach the GPT-Neo mannequin from scratch that you must comply with order. This wants excessive computational energy (because of TPU, this doesn’t tackle eternally!!) and wishes time to run, however it’s a tremendous run by for GPT-Neo 

GPT Neo is the title of the codebase for transformer-based language fashions loosely styled across the GPT structure. There are two kinds of GPT Neo supplied: 1.3B params and a pair of.7B params for suitability. In this submit, we’ll be discussing how one can make use of HuggingFace supplied GPT Neo: 2.7B params utilizing a couple of strains of code.

Let’s dig within the code!

Code Implementation of GPT-Neo

Importing the Dependencies 

Installing PyTorch, the best manner to do that is to go over to PyTorch.org, choose your system necessities, and copy-paste the command immediate. I’m utilizing a Windows machine with a Google Colab pocket book. Select the steady construct, which is 1.8.1 at this level. Then choose your Operating System. I favor utilizing the pip bundle whereas in Google colab however one can favor conda whereas in Jupyter. It goes to assist lots when you’ve got a GPU; else choose CUDA 10.2.

You’ll see the command, and it’s prepared to make use of!

Make certain you might have the newest model of PyTorch. This could take some time for those who set up this for the primary time, as it might need to uninstall older variations first then set up the newer variations. It extremely depends upon your web connectivity.

!pip set up torch==1.8.1+cu111 torchvision==0.9.1+cu111 

torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Installing transformers, we are going to leverage HuggingFace, and the wonderful factor about that is that you’ve all kinds of various pipelines for various duties. Amazing proper?! 

I extremely suggest exploring probably the most round transformers on HuggingFace. 

!pip set up transformers

Importing pipeline from transformers as we’re going to use the text generation pipeline

See Also

from transformers import pipeline

Setting up the Generator

Download the GPT Neo mannequin, which has 2.7 Billion parameters which is kind of big. Again, it will take time as the scale is round 10 GigaBytes, so ensure you have a very good web connection. But it’s also possible to obtain the GPT Neo small version of just one.Three billion parameters which is comparatively small. 

Instantiate the mannequin utilizing a variable title; text-generation is the title of our pipeline, as talked about earlier.  

generator = pipeline('text-generation', mannequin="EleutherAI/gpt-neo-2.7B")

Generating Text utilizing Prompt

We have to supply a immediate or matter on which we wish the textual content to be generated.

immediate = "The current stock market"

Output Text

Save the output to a variable named ‘res’. Arguments given to the generator created earlier than are as follows: the title of the immediate, size of the textual content generated you need, leverage sampling in our mannequin, the worth used to mannequin the following set of chances.  

 res = generator(immediate, max_length=50, do_sample=True, temperature=0.9)
 Printing the output to a textual content title as generated_text

The Output will appear to be this.

Trying a special immediate, let’s say one thing like this.

immediate = “import pandas as pd”

Running it will give us one thing like this. 

As you possibly can see, it has already imported fundamental libraries used; you possibly can think about what a stage of contextuality this mannequin has reached. Amazing proper?! 

Saving to a File

Open a brand new textual content file named gpttext.txt for saving our output by utilizing the write methodology.

 with open('gpttext.txt', 'w') as f:

So this was all about how one can attempt one of the best textual content mannequin on the market and leverage it for various duties. Try this pocket book with totally different prompts and totally different arguments. Links can be current right here in addition to within the pocket book. 

NOTE: Make certain you might have sufficient RAM in Google Colab; else, the runtime will crash after downloading the mannequin; therefore you possibly can attempt the smaller model of GPT Neo.

The pocket book is supplied here with all of the code you want for reference.


Join Our Telegram Group. Be a part of an attractive on-line neighborhood. Join Here.

Subscribe to our Newsletter

Get the newest updates and related gives by sharing your e mail.


Please enter your comment!
Please enter your name here