这篇论文《从稀疏到密集:在 GPT-4 中使用密度链提示生成摘要 | From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting》,提出了一个改进LLM生成摘要的Prompt。
原理很有意思:它通过对文章几轮摘要,每一轮都提炼出新的关键词,并根据新的关键词融合、压缩旧版本摘要,一步步提升摘要中信息密度,直到你觉得摘要中信息密度足够为止。
详细解释一下Prompt的工作原理:
首先要说明一个名词:“missing entity 缺失的实体”
“informative entity 信息实体” 指的是能代表文章核心观点的关键词。
“missing entity 缺失的实体” 指的是:
- 具体并且简洁的关键词
- 与文章相关
- 新的(之前的摘要中没出现过)
- 忠于原文(不能自己编造)
- 任意位置(可以在文章的任意位置)
然后就是具体在每一轮摘要中的步骤:
步骤1:从文章中找出在上一轮摘要中没有出现的1-3条“missing entity”
步骤2:基于前面生成的摘要和新找出来的1-3条“missing entity”,重写摘要,加上新的信息实体的细节。
最后就是每一轮摘要的指导原则:
1. 每一轮摘要的长度保持一致
2. 第一份摘要密度最低,由于只有最初的1-3个信息实体,需要加上很多填充词(如 "本文讨论了")
3. 每一轮基于前一轮重写,加上新的信息实体内容,通过融合、压缩和删除 "本文讨论了"等信息量不大的填充词来腾出空间。
4. 旧的信息实体的优先级更高,如果空间不足,则减少新实体的数量。
总结
这个Prompt被命名为CoD(Chain of Density 密度链),还比较形象生动,跟打铁似得,捶打一番再加点新料,继续捶打,直到成型。
完整的提示词如下,有兴趣的话可以自己试试:
Article: {{ ARTICLE }}
You will generate increasingly concise, entity-dense summaries of the above article.
Repeat the following 2 steps 5 times.
Step 1. Identify 1-3 informative entities (";" delimited) from the article which are missing from the previously generated summary.
Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the missing entities.
A missing entity is:
- relevant to the main story,
- specific yet concise (5 words or fewer),
- novel (not in the previous summary),
- faithful (present in the article),
- anywhere (can be located anywhere in the article).
Guidelines:
- The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
- Make every word count: rewrite the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
- The summaries should become highly dense and concise yet self-contained, i.e., easily understood without the article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.
Remember, use the exact same number of words for each summary.
Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary".
https://t.co/auzXNzcVgI
点击图片查看原图
点击图片查看原图