Ideas Forum
Learning Resources for AI, DeFi, web3 & Data Visualization.
Machine Learning
An overview and comparision of essential Machine Learning techniques
5Transformers & NLP
Welcome! To Lucidate's explainer videos on Transformers and Natural Language Processing. The technology that powers GPT3
9Ensemble Techniques
Greater Diversity. Better decisions. Enhanced Outcomes.
5Ethereum & web3
Ethereum promises to reduce costs of financial products, bringing the benefits of Capital Markets to a broader audience.
5Artificial Intelligence
Primers on Artificial Intelligence and its applications in Capital Markets
11Decentralised Finance
The impact of Decentralised Finance on Capital Markets
6Capital Markets
Welcome! See the table below for some introductory explainer videos for the Derivatives markets
3Data Pipelines for AI
Effectively cleanse sparse & patchy data to develop Best-in-Class AI systems
2Data Visualization
AI & DeFi create and consume vast amounts of data and information. How to mke sense of The Firehose
5Capital Markets in 60 Seconds
The stuff that matters in a minute
8
- Richard WalkerMar 06, 2023Transformers & NLPIn this tutorial, we will learn how to use APIs and audio to build prompts and completions to fine tune Transformers. First, we will show you how to use a News API with Python to extract news articles to create a dataset that can be used to train and fine-tune models like GPT-3. Then we will discuss how you can use audio streams to build data pipelines for fine-tuning. We will start by signing up for a News API account, which is free for non-commercial use, and defining the necessary libraries to automate the process of retrieving the latest news articles. We will create a Pandas dataframe to store the article information in a structured format, and then populate the Completion column of the dataframe by retrieving the article content using the requests library and the BeautifulSoup library to parse the HTML. Finally, we will save the dataframe to an Excel file for future use. We will also explore how to transcribe speech to text using OpenAI's Whisper and YouTube DL to access and manipulate YouTube videos, strip out the audio, and send the audio to Whisper for transcribing. We will use Python classes and modules to automate the workflow and allow us to build specialized language models in any field of our choice. To begin with, we'll need to set up our environment. We'll need to have Python installed on our machine, along with the Whisper and YouTube DL modules. We can install these modules using pip, the Python package manager. Once we have everything set up, we can start building our pipeline. The first step is to choose a video that we want to use to generate our prompts and completions. We'll need to use YouTube DL to download the audio track from the video, as we'll be using Whisper to transcribe the audio to text. YouTube DL is a Python module that allows us to access and download YouTube videos, and it's easy to use. Once we have the audio track downloaded, we can use Whisper to transcribe the speech to text. Whisper is a state-of-the-art deep learning model developed by OpenAI that is designed to transcribe spoken words into text with high accuracy. It is capable of handling a wide range of audio inputs, including noisy or low-quality audio, and can transcribe speech in multiple languages. Whisper is also designed to be scalable, so it can handle large volumes of audio data and process it quickly. We'll need to load the Whisper model that we want to use, which comes in five different levels of sophistication. Unless our audio track is very noisy, we can use the 'base' level for most English language tracks. The 'large' model can handle multiple languages with varying degrees of success. Once we have our transcribed text, we can use it to build our prompts and completions. We can chunk the text up into sentences and use every nth sentence as a prompt, with the intervening sentences as completions. Lucidate has found that n between 5 and 10 works pretty well here, but your results may vary. We can use the LucidateTextSplitter class that we built in a previous video to split the text into prompts and completions. The class takes in a text string and an integer 'n', and splits the string into sentences to create a list of prompts. We can then save our prompts and completions to an Excel file for future use. We can also return our list of sentences from the transcribed text, so we have a record of what was said in the video. To automate the workflow, we can build a LucidateTranscriber class. The constructor takes in the name of the Whisper model we want to use, as well as the URL of the YouTube video we want to transcribe. We can then use YouTube DL to download the audio track from the video and save it to our local machine. We can then use Whisper to transcribe the audio and generate our prompts and completions. We can save the prompts and completions to an Excel file and return our list of sentences from the transcribed text. With this class built, we can easily transcribe multiple YouTube videos and generate prompts and completions for each one. We can use these prompts and completions to fine-tune our NLP models to learn more about the language and content of the videos. It's important to remember that if we want to use content from other sources to build our prompts and completions, we need to make sure we have the legal right to use that content. Laws can vary from jurisdiction to jurisdiction, so it's important to get the right legal advice if we're looking to build a commercial product. In summary, using Whisper and YouTube DL to generate prompts and completions for fine-tuning NLP models is a scalable and effective way to build specialised language models in any field. With all of the text, audio and video available, there is no shortage of training material to fine-tune specialized AI of our own. We can build a scalable pipeline to gather information from video and audio by chunking up the article text into sentences and using every nth sentence as a prompt and the intervening sentences as completions. By training our model with prompts and completions from a specific domain, we can teach it new vocabulary and update the model's attention heads to make it more attuned and useful in a specialized area.
- Richard WalkerFeb 28, 2023Transformers & NLPIn this video, Lucidate demonstrates how to use APIs and audio to build prompts and completions for fine-tuning Transformers. Transformers are a type of machine learning model that have revolutionized the field of natural language processing. They use a deep learning architecture to generate human-like text and have been used to develop powerful language AI models such as ChatGPT. Click '> Play' above to find out how to use APIs and audio to create specialized, bespoke AI NLP models' The video begins with a discussion on prompt and completion datasets and their usefulness in training and fine-tuning language models to generate human-like text. We then demonstrate how to use the News API with Python to create a dataset of news articles that can be used to train and fine-tune models like GPT-3. This is done by importing necessary libraries, defining the News API endpoint and API key as variables, and making a request to the News API to retrieve the latest news articles. The article information is then parsed and stored in a Pandas dataframe in a structured format with the headline as the prompt and the article content as the completion. The video then moves on to audio streams and how they can be used to build data pipelines for fine-tuning language models. Walker discusses OpenAI's Whisper, which is designed to transcribe spoken words into text with high accuracy. The video then shows how to extract the text from a YouTube video using Whisper and YouTube DL, and how to automate the workflow using Python classes and modules to build specialized language models in any field of choice. After watching this video you will understand end-to-end how to use APIs and audio to fine-tune Transformers to create specialized AI language models.
- Richard WalkerFeb 21, 2023Transformers & NLPIn this video, we discuss fine-tuning transformer neural networks. We explain why fine-tuning is important and how it can be used to create bespoke models. We identify two specific problems that need to be solved: grabbing relevant text from the Internet and breaking it up into prompts and completions. Click '>Play' above to discover how to build simple prompts and completions for fine-tuning GPT-3 To solve the second problem, we build a 'splitter' class in Python. We demonstrate how to split text into individual sentences and use every "n'th" sentence as a prompt, with the intervening sentences as completions. We also emphasize that manually populating an Excel spreadsheet with prompts and completions is not scalable and should be considered only as a last resort. To solve the other problem (retrieving text information from the Internet and rather than display it in a browser, use it in one of our programs), we use Beautiful Soup, a popular Python library that allows users to extract HTML components of a web page instead of displaying them in a browser. We show how to use Beautiful Soup to extract text from a website, such as the "Quotes to Scrape" website or the Wikipedia page for Beautiful Soup. Furthermore we provide a Python class that breaks down a string of text into its component sentences and groups them into prompts and completions. We use 'Pandas' (another useful python library that is used to manipulate tables of data in rows and columns called 'Dataframes'). The dataframe in our python class contains two columns, one for prompts and one for completions, making it easy to feed the data into a language model for fine-tuning or other applications. We conclude by emphasizing the importance of fine-tuning AI models to achieve efficiency multipliers and scalability. In the next episode, where we will look at more sophisticated ways of building prompts and completions using newsfeed APIs, audio, and video.