Open
Conversation
Fix insecure pickle deserialization in redis_cache
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MindSpider/DeepSentimentCrawling/MediaCrawler/cache 子模块中的 redis_cache.py 模块原本使用 Python 标准库 pickle 来对存储在 Redis 缓存中的数据进行序列化和反序列化。
具体来说,get() 方法使用 pickle.loads(value),而 set() 方法使用 pickle.dumps(value)。
问题
在缓存系统中使用 pickle 被认为是高度不安全的。pickle 模块并不是为了抵御错误或恶意构造的数据而设计的。如果对来自不可信或未认证来源的数据进行反序列化(例如配置不当的 Redis 实例),攻击者可能在主机上执行任意代码。
如果攻击者获得了 Redis 数据库的写入权限(这种情况在配置错误的部署环境或 SSRF 攻击中很常见),他们就可以注入恶意的序列化 payload。当应用程序读取这些缓存条目时,pickle.loads() 会执行其中嵌入的代码,从而导致主机系统被完全攻陷,形成 远程代码执行(RCE) 漏洞。
修复
为了解决这一不安全的反序列化漏洞,项目已完全用 json 替换了 pickle 作为序列化机制:
将 import pickle 替换为 import json。
在 get() 方法中将 pickle.loads(value) 改为 json.loads(value)。
在 set() 方法中将 pickle.dumps(value) 改为 json.dumps(value)。
json 模块只支持序列化基本数据类型(例如字符串、列表、字典、数字等),并且在反序列化时不会执行任意代码。因此,这一修改在保留爬虫组件缓存功能的同时,有效消除了远程代码执行的风险。
在本地模拟环境中已运行测试,确认 json 可以正确处理应用程序中常见的缓存数据类型,例如字符串和字典。