Documents and Embedding Config
Configuration example
cache_folder: /path/to/cache/folder ## specify a cache folder for embeddings models, huggingface and sentence transformers
embeddings:
# ** Attention ** - `embedding_path` should be uniquer per configuration file.
embeddings_path: /path/to/embedding/folder ## specify a folder where embeddings will be saved.
embedding_model: # Optional embedding model specification, default is e5-large-v2. Swap to a smaller model if out of CUDA memory
# Supported types: "huggingface", "instruct", "openai"
type: sentence_transformer # other supported types - "huggingface" and "instruct"
model_name: 'Qwen/Qwen3-Embedding-0.6B'
splade_config: # Optional batch size of sparse embeddings. Reduce if getting out-of-memory errors on CUDA.
n_batch: 5
chunk_sizes: # Specify one more chunk size to split (querying multi-chunk results will be slower)
- 1024
document_settings:
# Can specify multiple documents collections and filter by label
- doc_path: /path/to/documents ## specify the docs folder
exclude_paths: # Optional paths to exclude
- /path/to/documents/subfolder1
- /path/to/documents/subfolder2
scan_extensions: # specifies files extensions to scan recursively in `doc_path`.
- pdf
- md
additional_parser_settings: # Optional section, don't have to include
md:
skip_first: True # Skip first section which often contains metadata
merge_sections: False # Merge # headings if possible, can be turned on and off depending on document stucture
remove_images: True # Remove image links
# Optional setting
# For azuredoc support - pip install "pyllmsearch[azureparser]"
pdf_table_parser: gmft # azuredoc
# Optional setting
pdf_image_parser:
image_parser: gemini-1.5-pro # gemini-1.5-flash
system_instructions: |
You are an research assistant. You analyze the image to extract detailed information. Response must be a Markdown string in the follwing format:
- First line is a heading with image caption, starting with '# '
- Second line is empty
- From the third line on - detailed data points and related metadata, extracted from the image, in Markdown format. Don't use Markdown tables.
passage_prefix: "passage: " # Often, specific prefix needs to be included in the source text, for embedding models to work properly
label: "documment-collection-1" # Add a label to the current collection
- doc_path: /another/path/to/documents ## specify the docs folder
scan_extensions: # specifies files extensions to scan recursively in `doc_path`.
- md
passage_prefix: "passage: " # Often, specific prefix needs to be included in the source text, for embedding models to work properly
label: "documment-collection-2" # Add a label to the current collection
semantic_search:
search_type: similarity # Currently, only similarity is supported
replace_output_path: # Can specify list of search/replace settings
- substring_search: "/storage/llm/docs/" ## Specifies substring to replace in the output path of the document
substring_replace: "obsidian://open?vault=knowledge-base&file=" ## Replaces with this string
append_suffix: # Specifies additional template to append to an output path, useful for deep linking
append_template: "#page={page}" # For example will append a page from metadata of the document parser
# Will ensure that context provided to LLM is less than max_char_size. Useful for locally hosted models and limited hardware.
# Reduce if out of CUDA memory.
max_char_size: 16384 # Reduce if necessary for locally hosted LLMs
# Maximum number of text chunks to retrive for dense and sparse embeddings
# Total number of chunks is max_k * 2
max_k: 25
query_prefix: "query: " # Often queries have to be prefixed for embedding models, such as e5
score_cutoff: -3.0 # Optional reranker score cutoff. Documents below this score will be excluded from the returned document list
hyde:
enabled: False
multiquery:
enabled: False
reranker:
enabled: True
model: "bge" # for `BAAI/bge-reranker-base` or "marco" for cross-encoder/ms-marco-MiniLM-L-6-v2
# Optionally enable conversation history settings (default False)
conversation_history_settings:
enabled: True
max_history_length: 3
rewrite_query: True
persist_response_db_path: "/path/to/responses.db" # optional sqlite database filename. Allows to save responses offlien to sqlite, for future analysis.
Document Config Reference
- class llmsearch.config.Config(*, cache_folder: Path, embeddings: EmbeddingsConfig, semantic_search: SemanticSearchConfig, llm: LLMConfig | None = None, persist_response_db_path: str | None = None)
- cache_folder: Path
Configures path to cache LLM and embedding models.
- check_embeddings_exist() bool
Checks if embedings exist in the specified folder
- embeddings: EmbeddingsConfig
Configures document paths and embedding settings.
- llm: LLMConfig | None
Don’t use directly.
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'cache_folder': FieldInfo(annotation=Path, required=True), 'embeddings': FieldInfo(annotation=EmbeddingsConfig, required=True), 'llm': FieldInfo(annotation=Union[LLMConfig, NoneType], required=False), 'persist_response_db_path': FieldInfo(annotation=Union[str, NoneType], required=False), 'semantic_search': FieldInfo(annotation=SemanticSearchConfig, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- persist_response_db_path: str | None
Optional path for SQLite database for results storage.
- semantic_search: SemanticSearchConfig
Confgures semantic search settings.
- class llmsearch.config.ConversationHistoryQAPair(*, question: str, answer: str)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'answer': FieldInfo(annotation=str, required=True), 'question': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.ConversrationHistorySettings(*, enabled: bool = False, max_history_length: int, rewrite_query: bool, history: List[ConversationHistoryQAPair] = None, template_instruction: str = 'When answering questions, take into consideration the history of the chat converastion, which is listed below under Chat History. The chat history is in reverse chronological order, so the most recent exhange is at the top.', template_contextualize: str = "\n Given a chat history and the latest user question which might reference to context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, return only reformulated question. Do NOT mention it is 'reformulated question', return only body of the question and nothing else.\n\n {chat_history}\n\n User question: {user_question}\n ", template_header: str = '\nChat History:\n=============\n', template_qa_pairs: str = 'User: {question}\nAssistant: {answer}\n\n')
- history: List[ConversationHistoryQAPair]
Keeps history of conversation pair, up to max_history_length
- max_history_length: int
Maximum length of conversational history paris to remember (single pair = query + response)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'enabled': FieldInfo(annotation=bool, required=False, default=False), 'history': FieldInfo(annotation=List[ConversationHistoryQAPair], required=False, default_factory=list), 'max_history_length': FieldInfo(annotation=int, required=True), 'rewrite_query': FieldInfo(annotation=bool, required=True), 'template_contextualize': FieldInfo(annotation=str, required=False, default="\n Given a chat history and the latest user question which might reference to context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, return only reformulated question. Do NOT mention it is 'reformulated question', return only body of the question and nothing else.\n\n {chat_history}\n\n User question: {user_question}\n "), 'template_header': FieldInfo(annotation=str, required=False, default='\nChat History:\n=============\n'), 'template_instruction': FieldInfo(annotation=str, required=False, default='When answering questions, take into consideration the history of the chat converastion, which is listed below under Chat History. The chat history is in reverse chronological order, so the most recent exhange is at the top.'), 'template_qa_pairs': FieldInfo(annotation=str, required=False, default='User: {question}\nAssistant: {answer}\n\n')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- rewrite_query: bool
Rewrite query for better context understanding
- class llmsearch.config.Document(*, page_content: str, metadata: dict = None)
Interface for interacting with a document.
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'metadata': FieldInfo(annotation=dict, required=False, default_factory=dict), 'page_content': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.DocumentPathSettings(*, doc_path: Annotated[Path, PathType(path_type=dir)] | str, exclude_paths: List[Annotated[Path, PathType(path_type=dir)] | str] = None, scan_extensions: List[str], pdf_table_parser: PDFTableParser | None = None, pdf_image_parser: PDFImageParseSettings | None = None, additional_parser_settings: Dict[str, Any] = None, passage_prefix: str = '', label: str = '')
- additional_parser_settings: Dict[str, Any]
Optional parser settings (parser dependent)
- doc_path: Annotated[Path, PathType(path_type=dir)] | str
Defines document folder for a given document set.
- exclude_paths: List[Annotated[Path, PathType(path_type=dir)] | str]
List of folders to exclude from scanning.
- label: str
Optional label for the document set, will be included in the metadata.
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'additional_parser_settings': FieldInfo(annotation=Dict[str, Any], required=False, default_factory=dict), 'doc_path': FieldInfo(annotation=Union[Annotated[Path, PathType], str], required=True), 'exclude_paths': FieldInfo(annotation=List[Union[Annotated[Path, PathType], str]], required=False, default_factory=list), 'label': FieldInfo(annotation=str, required=False, default=''), 'passage_prefix': FieldInfo(annotation=str, required=False, default=''), 'pdf_image_parser': FieldInfo(annotation=Union[PDFImageParseSettings, NoneType], required=False), 'pdf_table_parser': FieldInfo(annotation=Union[PDFTableParser, NoneType], required=False), 'scan_extensions': FieldInfo(annotation=List[str], required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- pdf_image_parser: PDFImageParseSettings | None
If enabled, will parse images in pdf files using a specific of a parser.
- pdf_table_parser: PDFTableParser | None
If enabled, will parse tables in pdf files using a specific of a parser.
- scan_extensions: List[str]
List of extensions to scan.
- class llmsearch.config.EmbedddingsSpladeConfig(*, n_batch: int = 3)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'n_batch': FieldInfo(annotation=int, required=False, default=3)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.EmbeddingModel(*, type: EmbeddingModelType, model_name: str, additional_kwargs: dict = None)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'additional_kwargs': FieldInfo(annotation=dict, required=False, default_factory=dict), 'model_name': FieldInfo(annotation=str, required=True), 'type': FieldInfo(annotation=EmbeddingModelType, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.EmbeddingModelType(value)
- class llmsearch.config.EmbeddingsConfig(*, embedding_model: ~llmsearch.config.EmbeddingModel = EmbeddingModel(type=<EmbeddingModelType.instruct: 'instruct'>, model_name='hkunlp/instructor-large', additional_kwargs={}), embeddings_path: ~pathlib.Annotated[~pathlib.Path, ~pydantic.types.PathType(path_type=dir)] | str, document_settings: ~typing.List[~llmsearch.config.DocumentPathSettings], chunk_sizes: ~typing.List[int] = [1024], splade_config: ~llmsearch.config.EmbedddingsSpladeConfig = EmbedddingsSpladeConfig(n_batch=5))
- chunk_sizes: List[int]
List of chunk sizes for text chunking, supports multiples sizes.
- document_settings: List[DocumentPathSettings]
Defines settings for one or more document sets.
- embedding_model: EmbeddingModel
Specifies embedding model to use for dense embeddings.
- embeddings_path: Annotated[Path, PathType(path_type=dir)] | str
Specifies output folder for embeddings.
- property labels: List[str]
Returns list of labels in document settings
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'chunk_sizes': FieldInfo(annotation=List[int], required=False, default=[1024]), 'document_settings': FieldInfo(annotation=List[DocumentPathSettings], required=True), 'embedding_model': FieldInfo(annotation=EmbeddingModel, required=False, default=EmbeddingModel(type=<EmbeddingModelType.instruct: 'instruct'>, model_name='hkunlp/instructor-large', additional_kwargs={})), 'embeddings_path': FieldInfo(annotation=Union[Annotated[Path, PathType], str], required=True), 'splade_config': FieldInfo(annotation=EmbedddingsSpladeConfig, required=False, default=EmbedddingsSpladeConfig(n_batch=5))}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- splade_config: EmbedddingsSpladeConfig
Specifies settings for sparse embeddings (SPLADE).
- class llmsearch.config.HydeSettings(*, enabled: bool = False, hyde_prompt: str = 'Write a short passage to answer the question: {question}')
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'enabled': FieldInfo(annotation=bool, required=False, default=False), 'hyde_prompt': FieldInfo(annotation=str, required=False, default='Write a short passage to answer the question: {question}')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.MultiQuerySettings(*, enabled: bool = False, multiquery_prompt: str = "You are a helpful assistant that generates multiple questions based on the source question.\n Generate {n_versions} additional related questions related to: ```{question}```.\n \n Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.\n Make sure they are complete questions, and that they are related to the original question.\n\n Generated questions should be separated by newlines, but shouldn't be enumerated.\n ", n_versions: int = 5)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'enabled': FieldInfo(annotation=bool, required=False, default=False), 'multiquery_prompt': FieldInfo(annotation=str, required=False, default="You are a helpful assistant that generates multiple questions based on the source question.\n Generate {n_versions} additional related questions related to: ```{question}```.\n \n Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.\n Make sure they are complete questions, and that they are related to the original question.\n\n Generated questions should be separated by newlines, but shouldn't be enumerated.\n "), 'n_versions': FieldInfo(annotation=int, required=False, default=5)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.ObsidianAdvancedURI(*, append_heading_template: str)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'append_heading_template': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.PDFImageParseSettings(*, image_parser: PDFImageParser, system_instruction: str = "You are an research assistant. You analyze the image to extract detailed information. Response must be a Markdown string in the follwing format:\n- First line is a heading with image caption, starting with '# '\n- Second line is empty\n- From the third line on - detailed data points and related metadata, extracted from the image, in Markdown format. Don't use Markdown tables. \n", user_instruction: str = 'From the image, extract detailed quantitative and qualitative data points.')
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'image_parser': FieldInfo(annotation=PDFImageParser, required=True), 'system_instruction': FieldInfo(annotation=str, required=False, default="You are an research assistant. You analyze the image to extract detailed information. Response must be a Markdown string in the follwing format:\n- First line is a heading with image caption, starting with '# '\n- Second line is empty\n- From the third line on - detailed data points and related metadata, extracted from the image, in Markdown format. Don't use Markdown tables. \n"), 'user_instruction': FieldInfo(annotation=str, required=False, default='From the image, extract detailed quantitative and qualitative data points.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.PDFImageParser(value)
- class llmsearch.config.PDFTableParser(value)
- class llmsearch.config.ReplaceOutputPath(*, substring_search: str, substring_replace: str)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'substring_replace': FieldInfo(annotation=str, required=True), 'substring_search': FieldInfo(annotation=str, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.RerankerModel(value)
- class llmsearch.config.RerankerSettings(*, enabled: bool = True, model: RerankerModel = RerankerModel.BGE_RERANKER)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'enabled': FieldInfo(annotation=bool, required=False, default=True), 'model': FieldInfo(annotation=RerankerModel, required=False, default=<RerankerModel.BGE_RERANKER: 'bge'>)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.ResponseModel(*, id: UUID = None, question: str, response: str, average_score: float, semantic_search: List[SemanticSearchOutput] = None, hyde_response: str = '')
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'average_score': FieldInfo(annotation=float, required=True), 'hyde_response': FieldInfo(annotation=str, required=False, default=''), 'id': FieldInfo(annotation=UUID, required=False, default_factory=create_uuid), 'question': FieldInfo(annotation=str, required=True), 'response': FieldInfo(annotation=str, required=True), 'semantic_search': FieldInfo(annotation=List[SemanticSearchOutput], required=False, default_factory=list)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- class llmsearch.config.SemanticSearchConfig(*, search_type: ~typing.Literal['mmr', 'similarity'], replace_output_path: ~typing.List[~llmsearch.config.ReplaceOutputPath] = None, obsidian_advanced_uri: ~llmsearch.config.ObsidianAdvancedURI | None = None, append_suffix: ~llmsearch.config.SuffixAppend | None = None, reranker: ~llmsearch.config.RerankerSettings = RerankerSettings(enabled=True, model=<RerankerModel.BGE_RERANKER: 'bge'>), max_k: int = 15, score_cutoff: float | None = None, max_char_size: int = 16384, query_prefix: str = '', hyde: ~llmsearch.config.HydeSettings = HydeSettings(enabled=False, hyde_prompt='Write a short passage to answer the question: {question}'), multiquery: ~llmsearch.config.MultiQuerySettings = MultiQuerySettings(enabled=False, multiquery_prompt="You are a helpful assistant that generates multiple questions based on the source question.\n Generate {n_versions} additional related questions related to: ```{question}```.\n \n Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.\n Make sure they are complete questions, and that they are related to the original question.\n\n Generated questions should be separated by newlines, but shouldn't be enumerated.\n ", n_versions=5), conversation_history_settings: ~llmsearch.config.ConversrationHistorySettings = ConversrationHistorySettings(enabled=False, max_history_length=2, rewrite_query=True, history=[], template_instruction='When answering questions, take into consideration the history of the chat converastion, which is listed below under Chat History. The chat history is in reverse chronological order, so the most recent exhange is at the top.', template_contextualize="\n Given a chat history and the latest user question which might reference to context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, return only reformulated question. Do NOT mention it is 'reformulated question', return only body of the question and nothing else.\n\n {chat_history}\n\n User question: {user_question}\n ", template_header='\nChat History:\n=============\n', template_qa_pairs='User: {question}\nAssistant: {answer}\n\n'))
- append_suffix: SuffixAppend | None
Allows to append suffix to document URL. Useful for deep linking to allow opening with external application, e.g. Obsidian.
- conversation_history_settings: ConversrationHistorySettings
Conversation history
- hyde: HydeSettings
Optional configuration for HyDE.
- max_char_size: int
Maximum character size for query + documents to fit into context window of LLM.
- max_k: int
Maximum number of documents to retrieve for dense OR sparse embedding (if using both, number of documents will be k*2)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'append_suffix': FieldInfo(annotation=Union[SuffixAppend, NoneType], required=False), 'conversation_history_settings': FieldInfo(annotation=ConversrationHistorySettings, required=False, default=ConversrationHistorySettings(enabled=False, max_history_length=2, rewrite_query=True, history=[], template_instruction='When answering questions, take into consideration the history of the chat converastion, which is listed below under Chat History. The chat history is in reverse chronological order, so the most recent exhange is at the top.', template_contextualize="\n Given a chat history and the latest user question which might reference to context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, return only reformulated question. Do NOT mention it is 'reformulated question', return only body of the question and nothing else.\n\n {chat_history}\n\n User question: {user_question}\n ", template_header='\nChat History:\n=============\n', template_qa_pairs='User: {question}\nAssistant: {answer}\n\n')), 'hyde': FieldInfo(annotation=HydeSettings, required=False, default=HydeSettings(enabled=False, hyde_prompt='Write a short passage to answer the question: {question}')), 'max_char_size': FieldInfo(annotation=int, required=False, default=16384), 'max_k': FieldInfo(annotation=int, required=False, default=15), 'multiquery': FieldInfo(annotation=MultiQuerySettings, required=False, default=MultiQuerySettings(enabled=False, multiquery_prompt="You are a helpful assistant that generates multiple questions based on the source question.\n Generate {n_versions} additional related questions related to: ```{question}```.\n \n Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.\n Make sure they are complete questions, and that they are related to the original question.\n\n Generated questions should be separated by newlines, but shouldn't be enumerated.\n ", n_versions=5)), 'obsidian_advanced_uri': FieldInfo(annotation=Union[ObsidianAdvancedURI, NoneType], required=False), 'query_prefix': FieldInfo(annotation=str, required=False, default=''), 'replace_output_path': FieldInfo(annotation=List[ReplaceOutputPath], required=False, default_factory=list), 'reranker': FieldInfo(annotation=RerankerSettings, required=False, default=RerankerSettings(enabled=True, model=<RerankerModel.BGE_RERANKER: 'bge'>)), 'score_cutoff': FieldInfo(annotation=Union[float, NoneType], required=False), 'search_type': FieldInfo(annotation=Literal['mmr', 'similarity'], required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- multiquery: MultiQuerySettings
Optional configuration for multi-query
- query_prefix: str
Prefix query with string BEFORE retrieval using embedding model.
- reranker: RerankerSettings
Configures re-ranker settings.
- score_cutoff: float | None
Documents with score less than specified will be excluded from relevant documents
- search_type: Literal['mmr', 'similarity']
Configure search type, currently only similarity can be used.
- class llmsearch.config.SemanticSearchOutput(*, chunk_link: str, chunk_text: str, metadata: dict)
- model_computed_fields = {}
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields = {'chunk_link': FieldInfo(annotation=str, required=True), 'chunk_text': FieldInfo(annotation=str, required=True), 'metadata': FieldInfo(annotation=dict, required=True)}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- llmsearch.config.load_yaml_file(config) dict
Loads YAML file or string and returns a dictionary