feat(summary): improve prompt to get more insights (#2446)

This pull request includes a commit that improves the prompt used to get
more insights from the document. The commit modifies the map and reduce
templates to provide clearer instructions for analyzing each section of
the document and generating a consolidated summary. The changes also
include adjustments to the text splitter to improve the chunk overlap.

<!--
ELLIPSIS_HIDDEN
-->


----

| 🚀 This description was created by
[Ellipsis](https://www.ellipsis.dev) for commit
adf5541dab |
|--------|

### Summary:
This PR enhances the document analysis and summary generation
instructions in the `SummaryAssistant` class and improves text splitting
by adjusting the `chunk_overlap` parameter.

**Key points**:
- Updated `map_template` and `reduce_template` in `process_assistant`
method of `SummaryAssistant` class in
`/backend/modules/assistant/ito/summary.py`.
- Adjusted `chunk_overlap` parameter of `CharacterTextSplitter` instance
from 0 to 100.


----
Generated with ❤️ by [ellipsis.dev](https://www.ellipsis.dev)



<!--
ELLIPSIS_HIDDEN
-->
This commit is contained in:
Stan Girard 2024-04-18 14:04:06 -07:00 committed by GitHub
parent 3e53ff0dce
commit c3e0c2f2ee
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -92,22 +92,32 @@ class SummaryAssistant(ITO):
llm = ChatLiteLLM(model="gpt-3.5-turbo", max_tokens=2000)
map_template = """The following is one document to summarize that has been split into multiple sections:
map_template = """The following is a document that has been divided into multiple sections:
{docs}
Based on the section, please identify the main themes, key points, and important information in each section.
Helpful Knowledge in language of the document:"""
Please carefully analyze each section and identify the following:
1. Main Themes: What are the overarching ideas or topics in this section?
2. Key Points: What are the most important facts, arguments, or ideas presented in this section?
3. Important Information: Are there any crucial details that stand out? This could include data, quotes, specific events, entity, or other relevant information.
4. People: Who are the key individuals mentioned in this section? What roles do they play?
5. Reasoning: What logic or arguments are used to support the key points?
6. Chapters: If the document is divided into chapters, what is the main focus of each chapter?
Remember to consider the language and context of the document. This will help in understanding the nuances and subtleties of the text."""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)
# Reduce
reduce_template = """The following is set of summaries for each section of the document:
reduce_template = """The following is a set of summaries for parts of the document:
{docs}
Take these and distill it into a final, consolidated summary of the document. Make sure to include the main themes, key points, and important information.
Take these and distill it into a final, consolidated summary of the document. Make sure to include the main themes, key points, and important information such as data, quotes,people and specific events.
Use markdown such as bold, italics, underlined. For example, **bold**, *italics*, and _underlined_ to highlight key points.
Please provide the final summary with sections using bold headers.
Sections should be: a short summary of the document called summary, and a list of key points called key points.
Sections should always be Summary and Key Points, but feel free to add more sections as needed.
Always use bold text for the sections headers.
Keep the same language as the documents.
Summary:"""
Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
# Run chain
@ -141,7 +151,7 @@ class SummaryAssistant(ITO):
)
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=1000, chunk_overlap=0
chunk_size=1000, chunk_overlap=100
)
split_docs = text_splitter.split_documents(data)