Generation

generate functionMon, 22 May 2023

connect to and initialize pinecone vector database, then read text from a text (.txt) document, chunk it into 2000 tokens with a 200 token overlap before using text-embedding-ada-002 openai api to generate and embed the document in pinecone

# Initialize pinecone database print('Initializing pinecone vector database') pc = Pinecone() # Connect to pinecone vector database pc.connect() # load the file with open('on-writing-well.txt', 'r') as file: text = file.read() # load text into 2000 token (200 token overlap) chunks chunk = 2000 overlap = 200 # initialize pinecone vector database pc.init_doc() # for each chunk of text for i in range(0, len(text) - chunk, chunk - overlap): # get the text chunk_text = text[i:i + chunk - overlap] # transform text to vector vector = text_to_vector(chunk_text) # add vector to database pc.insert(vector)

Questions about programming?Chat with your personal AI assistant