Skip to content
Snippets Groups Projects
Commit 3782eebb authored by Timm Lehmberg's avatar Timm Lehmberg
Browse files

removed API keys

parent 27850934
Branches
No related tags found
No related merge requests found
Showing
with 50 additions and 19 deletions
$3bc64162-449a-442e-9030-0607adda0a05id *string08text *string085vector *fixed_size_list:float:153608%
attributes *string08
\ No newline at end of file
$1443ca07-9f7d-4332-ba7c-f28248fd2d71id *string08text *string085vector *fixed_size_list:float:153608%
attributes *string08
\ No newline at end of file
%% Cell type:markdown id: tags:
# fRAGate
%% Cell type:code id: tags:
``` python
import os
#!export GRAPHRAG_API_KEY=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
#!export OPENAI_API_KEY=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
#!export GRAPHRAG_API_KEY=xxx
#!export OPENAI_API_KEY=xxx
os.environ['GRAPHRAG_API_KEY'] = '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'
os.environ['OPENAI_API_KEY'] = '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'
os.environ['GRAPHRAG_API_KEY'] = 'xxx'
os.environ['OPENAI_API_KEY'] = 'xxx'
!python -m graphrag.query --root . --method local "nenne alle acht Akademien"
```
%% Output
INFO: Vector Store Args: {}
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'model': 'gpt-3.5-turbo', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_embedding", 'model': 'text-embedding-3-small', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
SUCCESS: Local Search Response:
In den bereitgestellten Daten sind insgesamt acht Akademien aufgeführt. Hier sind sie aufgelistet:
1. Bayerische Akademie der Wissenschaften [Data: Entities (77)]
2. Nordrhein-Westfälische Akademie der Wissenschaften und der Künste [Data: Entities (78)]
3. Akademie der Wissenschaften und der Literatur Mainz [Data: Entities (3)]
4. Akademie der Wissenschaften in Hamburg [Data: Entities (30)]
5. Berlin-Brandenburgische Akademie der Wissenschaften [Data: Entities (2)]
6. Sächsische Akademie der Wissenschaften zu Leipzig [Data: Entities (93)]
7. Heidelberger Akademie der Wissenschaften [Data: Entities (65)]
8. Akademievorhaben [Data: Entities (86)]
Diese Akademien sind in verschiedenen Projekten und Initiativen aktiv und tragen zur Forschung und Erhaltung des kulturellen Erbes bei. Jede Akademie hat spezifische Schwerpunkte und Beiträge zur wissenschaftlichen Gemeinschaft.
%% Cell type:markdown id: tags:
......
%% Cell type:code id: tags:
``` python
#!pip install bash_kernel
#!python -m bash_kernel.install
#!pip install graphrag
```
%% Cell type:code id: tags:
``` python
import os
#!export GRAPHRAG_API_KEY=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
#!export OPENAI_API_KEY=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
#!export GRAPHRAG_API_KEY=xxx
#!export OPENAI_API_KEY=xxx
os.environ['GRAPHRAG_API_KEY'] = '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'
os.environ['OPENAI_API_KEY'] = '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'
os.environ['GRAPHRAG_API_KEY'] = 'xxx'
os.environ['OPENAI_API_KEY'] = 'xxx
```
%% Cell type:code id: tags:
``` python
!python -m graphrag.index --init --root .
```
%% Cell type:markdown id: tags:
* change settings.yaml
* change model to ```model: gpt-3.5-turbo``` to avoid cost explosion
* probably adjust \
```file_pattern: ".*\\.txt$" ``` to a pattern matching yout input files\
```tokens_per_minute: ... ``` and \
```requests_per_minute: ... ``` to values corresponding to your Tiers as described under https://platform.openai.com/docs/guides/rate-limits
* run the indexer (Note!!! you only have to do that once, the indexed data will persist)
%% Cell type:code id: tags:
``` python
!python -m graphrag.index --root .
```
%% Cell type:markdown id: tags:
run a query
%% Cell type:code id: tags:
``` python
!python -m graphrag.query --root . --method local "how many persons are mentioned in the provided context?"
```
......
%% Cell type:markdown id: tags:
# Levels Of Text Splitting
https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/modules/
https://www.youtube.com/watch?v=8OJC21T2SL4
%% Cell type:code id: tags:
``` python
import os
from llama_index.core import VectorStoreIndex, Settings, SimpleDirectoryReader
from llama_index.core.embeddings import resolve_embed_model
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.llms.groq import Groq
os.environ['GROQ_API_KEY'] = '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'
os.environ['OPENAI_API_KEY'] = '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'
os.environ['GROQ_API_KEY'] = 'xxx'
os.environ['OPENAI_API_KEY'] = 'xxx'
```
%% Cell type:markdown id: tags:
SimpleFileNodeParser
%% Cell type:code id: tags:
``` python
from llama_index.core.node_parser import SimpleFileNodeParser
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("../myfirstrag/input/").load_data()
parser = SimpleFileNodeParser()
nodes = parser.get_nodes_from_documents(documents)
for i in nodes:
print(i)
```
%% Cell type:markdown id: tags:
TokenTextSplitter
%% Cell type:code id: tags:
``` python
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("../myfirstrag/input/").load_data()
splitter = TokenTextSplitter(
chunk_size=64,
chunk_overlap=16,
separator=" ",
)
nodes = splitter.get_nodes_from_documents(documents)
for i in nodes:
print(i)
```
%% Cell type:markdown id: tags:
SentenceSplitter
%% Cell type:code id: tags:
``` python
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("../myfirstrag/input/").load_data()
splitter = SentenceSplitter(
chunk_size=128,
chunk_overlap=64,
)
nodes = splitter.get_nodes_from_documents(documents)
for i in nodes:
print(i)
```
%% Cell type:markdown id: tags:
SentenceWindowNodeParser
%% Cell type:code id: tags:
``` python
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("../myfirstrag/input/").load_data()
node_parser = SentenceWindowNodeParser.from_defaults(
# how many sentences on either side to capture
window_size=3,
# the metadata key that holds the window of surrounding sentences
window_metadata_key="window",
# the metadata key that holds the original sentence
original_text_metadata_key="original_sentence",
)
nodes = node_parser.get_nodes_from_documents(documents)
i = 0
for i in nodes:
print(i)
```
%% Cell type:markdown id: tags:
HierarchicalNodeParser
%% Cell type:code id: tags:
``` python
from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core import SimpleDirectoryReader
for i in nodes:
print(i)
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128]
)
nodes = node_parser.get_nodes_from_documents(documents)
for i in nodes:
print(i)
```
%% Cell type:markdown id: tags:
SemanticSplitterNodeParser
%% Cell type:code id: tags:
``` python
import os
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.core.embeddings import resolve_embed_model
from llama_index.core.node_parser import SentenceSplitter
embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")
for i in nodes:
print(i)
splitter = SemanticSplitterNodeParser(
buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model, sentence_splitter=lambda x: x.split('. ')
)
nodes = splitter.get_nodes_from_documents(documents)
print(len(nodes))
for i in nodes:
print(i)
```
%% Cell type:code id: tags:
``` python
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine()
response = query_engine.query("how many individuals are mentioned in the context and where do the go?")
print(response)
```
......
%% Cell type:markdown id: tags:
# Connection to remote llm via API
%% Cell type:code id: tags:
``` python
#!pip install llama_index
from llama_index.llms.groq import Groq
from llama_index.llms.openai import OpenAI
```
%% Cell type:markdown id: tags:
Set environment variables for your API Key(s)
%% Cell type:code id: tags:
``` python
import os
os.environ['GROQ_API_KEY'] = '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'
os.environ['OPENAI_API_KEY'] = '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'
os.environ['GROQ_API_KEY'] = 'xxx'
os.environ['OPENAI_API_KEY'] = 'xxx'
```
%% Cell type:markdown id: tags:
Choose model and temperature and pass it to the constructor for the llm object
%% Cell type:code id: tags:
``` python
#
llm = Groq(model="llama3-70b-8192", temperature=0.0)
#llm = OpenAI(model="gpt-3.5-turbo", temperature=0.0)
# complete the prompt
response = llm.complete("Warum ist die Banane krumm?")
print(response)
```
%% Cell type:markdown id: tags:
_in case you get bored:_
* change queries (ok, that's too trivial)
* create multiple llm objects, pass queries/responses to another object
......
%% Cell type:markdown id: tags:
# Connecting to LLMs via API
%% Cell type:markdown id: tags:
check API connections and available models
%% Cell type:code id: tags:
``` python
# List the models available
# get OpenAI API keys from https://platform.openai.com/api-keys
!curl https://api.openai.com/v1/models -H "Authorization: Bearer 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"
!curl https://api.openai.com/v1/models -H "Authorization: Bearer xxx"
```
%% Cell type:code id: tags:
``` python
# List the models available (Groq)
# get Groq API key from https://console.groq.com/keys
!curl https://api.groq.com/openai/v1/models -H "Authorization: Bearer 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"
!curl https://api.groq.com/openai/v1/models -H "Authorization: Bearer xxx"
```
%% Cell type:markdown id: tags:
Send a request via curl to the OpenAI API to generate embeddings for a given text input using the specified model.\
_(Note: Groq does not provide embedidng models)_
%% Cell type:code id: tags:
``` python
!curl https://api.openai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 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" \
-H "Authorization: Bearer xxx" \
-d '{"model": "text-embedding-3-small", "input": "Ottos Mops kotzt" }'
```
%% Cell type:markdown id: tags:
unsure what embedding models are available?
%% Cell type:code id: tags:
``` python
!curl https://api.openai.com/v1/models -H "Authorization: Bearer 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" | grep embed
!curl https://api.openai.com/v1/models -H "Authorization: Bearer xxx" | grep embed
```
%% Cell type:markdown id: tags:
Chat completion via API
%% Cell type:code id: tags:
``` python
!curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 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" \
-H "Authorization: Bearer xxx" \
-d '{ "model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "Warum ist die Banane krumm?"}],"max_tokens": 100 }'
```
%% Output
{
"id": "chatcmpl-AB6JNooN7W9GLwE4ZsfeYT4816lKK",
"object": "chat.completion",
"created": 1727209233,
"model": "gpt-3.5-turbo-0125",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Die Banane ist krumm, weil sie sich während des Wachstums dem Licht zuwendet. Da Bananen an Bäumen wachsen und nach oben wachsen, müssen sie sich der Schwerkraft beugen, was dazu führt, dass sie eine krumme Form annehmen. Dieses Phänomen wird als \"Negativ-Geotropismus\" bezeichnet.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 17,
"completion_tokens": 84,
"total_tokens": 101,
"completion_tokens_details": {
"reasoning_tokens": 0
}
},
"system_fingerprint": null
}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment