Sentence Similarity with Embeddings - Exercises

In this set of exercises, you will work with a sentence similarity app and learn how to implement retrieval and generative models using embeddings. You will be guided through making the app more efficient, experimenting with different datasets, and incorporating advanced techniques like Retrieval-Augmented Generation (RAG).

Q1: Pre-process all sentences

If you noticed the current algorithm, it re-calculates the embeddings for all sentences every time the button is clicked. Make this app more efficient by pre-calculating embeddings for all sentences once.

Q2: Use a different dataset

Create a new file called q2.html

The file prompts.csv contains a list of prompts appropriate for feeding into LLMs such as ChatGPT. Turn your app into a "prompt generator" by output a prompt from prompts.csv that is most similar to the user's input sentence.

Q3: Retrieval-Based Q&A (Only for the motivated i.e. aiming for an A and above)

Create a new file called q3.html

In this task, you will implement a retrieval-based question-answering system using embeddings. Instead of generating answers from scratch, you will use embeddings to retrieve the most relevant question in your dataset and output the associated answer.

First, choose an appropriate dataset from huggingface that contains "Questions and Answers".

Retrieve the Most Similar Question: After embedding the user’s input question, compare it to the embeddings of questions in your dataset to find the most similar question.
Output the Answer: Once the most similar question is retrieved, simply output the corresponding answer from your dataset.

Goal: Learn how embedding-based retrieval can be used for efficient question-answering without the need for a generative model.

Q4: Complete with Retrieval-Augmented Generation or RAG (Only if you are extremely motivated i.e. aiming for A+)

Create a file called q4.html

In this task, you will implement a Retrieval-Augmented Generation (RAG) system using embeddings. Instead of generating answers from scratch, you will use a combination of retrieval and generation.

Retrieve the Most Similar Question: After embedding the user’s input question, compare it to the embeddings of questions in your dataset (e.g., Natural Questions or Quora Question Pairs) to find the most similar question.
Generate the Answer: Once the most similar question is retrieved, use a generative model (like GPT) to elaborate on or refine the answer based on the retrieved context. The generative model should help to provide a more detailed or relevant answer, beyond simply copying the stored answer.

NOTE: You cannot embed openai key (or any other keys) into your javascript, so you will need to create some server side component or use a local model, which runs really slow in the browser.

Goal: Learn how to integrate retrieval-based methods with generative models to produce more nuanced, context-aware responses.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
index.html		index.html
prompts.csv		prompts.csv
prompts_copy.csv		prompts_copy.csv
q1.html		q1.html
q2.html		q2.html
q3.html		q3.html
q4.html		q4.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Similarity with Embeddings - Exercises

Q1: Pre-process all sentences

Q2: Use a different dataset

Q3: Retrieval-Based Q&A (Only for the motivated i.e. aiming for an A and above)

Q4: Complete with Retrieval-Augmented Generation or RAG (Only if you are extremely motivated i.e. aiming for A+)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentence Similarity with Embeddings - Exercises

Q1: Pre-process all sentences

Q2: Use a different dataset

Q3: Retrieval-Based Q&A (Only for the motivated i.e. aiming for an A and above)

Q4: Complete with Retrieval-Augmented Generation or RAG (Only if you are extremely motivated i.e. aiming for A+)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages