Analysis of Python's new string format vulnerability

Analysis of Python's new string format vulnerability

Preface

This article conducts an in-depth analysis of the security vulnerabilities of a new syntax for formatting strings introduced by Python, and provides corresponding security solutions.

When we use str.format on untrusted user input, it will bring security risks - I have known about this problem for a long time, but I didn't realize its seriousness until today. Because attackers can use it to bypass the Jinja2 sandbox, which will cause serious information leakage. At the same time, I provide a new secure version of str.format at the end of this article.

As a reminder, this is a pretty serious security risk, and the reason it's being written about here is that most people probably don't know how easily it can be exploited.

Core Issues

Starting from Python 2.6, Python introduced a new syntax for formatting strings inspired by .NET. Of course, in addition to Python, Rust and some other programming languages ​​also support this syntax. With the help of the .format() method, this syntax can be applied to bytes and unicode strings (in Python 3, it can only be used for unicode strings). In addition, it can also be mapped to the more customizable string.Formatter API.

A feature of this syntax is that one can determine the positional and keyword arguments of the string format, and can explicitly reorder the data items at any time. In addition, it is even possible to access object attributes and data items - this is the root cause of the security issue here.

In general, people can use it to:

  1. > > > 'class of {0} is {0.__class__}'.format(42)
  2. "class of 42 is < class 'int' > "

Essentially, anyone who can control the format string can potentially access various internal properties of the object.

What's the problem?

The first question is how to control the format string. You can start from the following places:

1. Untrusted translators in string files. These are likely to work because many applications translated into multiple languages ​​use this new Python string formatting method, but not everyone does a thorough review of all input strings.

2. User-exposed configuration. Since some system users can configure certain behaviors, these configurations may be exposed in the form of format strings. It is important to note that I have seen some users configure notification emails, log message formats, or other basic templates through web applications.

Hazard Level

If you just pass the C interpreter object to the format string, it's not very dangerous, because then the most you'll expose is some integer class or something like that.

However, once Python objects are passed to this format string, things get tricky. This is because the amount of things that can be exposed from a Python function is quite staggering. Here is a scenario for a hypothetical web application that could leak a key:

  1. CONFIG = {
  2. 'SECRET_KEY': 'super secret key'
  3. }
  4.    
  5. class Event(object):
  6. def __init__(self, id, level, message):
  7. self.id = id
  8. self.level = level
  9. self.message = message
  10.    
  11. def format_event(format_string, event):
  12. return format_string.format( event event =event)

If a user could inject format_string here, they would be able to discover a secret string like this:

  1. {event.__init__.__globals__[CONFIG][SECRET_KEY]}

Sandboxing Formatting

So, what should you do if you need someone else to provide a formatted string? In fact, you can use some undisclosed internal mechanisms to change the string formatting behavior.

  1. from string import Formatter
  2. from collections import Mapping
  3.    
  4. class MagicFormatMapping(Mapping):
  5. """This class implements a dummy wrapper to fix a bug in the Python
  6. standard library for string formatting.
  7.    
  8. See http://bugs.python.org/issue13598 for information about why
  9. this is necessary.
  10. """
  11.    
  12. def __init__(self, args, kwargs):
  13. self._args = args
  14. self._kwargs = kwargs
  15. self._last_index = 0  
  16.    
  17. def __getitem__(self, key):
  18. if key == '':
  19. idx = self ._last_index
  20. self._last_index += 1
  21. try:
  22. return self._args[idx]
  23. except LookupError:
  24. pass
  25. key = str (idx)
  26. return self._kwargs[key]
  27.    
  28. def __iter__(self):
  29. return iter(self._kwargs)
  30.    
  31. def __len__(self):
  32. return len(self._kwargs)
  33.    
  34. # This is a necessary API but it's undocumented and moved around
  35. # between Python releases
  36. try:
  37. from _string import formatter_field_name_split
  38. except ImportError:
  39. formatter_field_name_split = lambda \
  40. x: x._formatter_field_name_split()
  41.    
  42. class SafeFormatter(Formatter):
  43.    
  44. def get_field(self, field_name, args, kwargs):
  45. first, rest = formatter_field_name_split (field_name)
  46. obj = self .get_value(first, args, kwargs)
  47. for is_attr, i in rest:
  48. if is_attr:
  49. obj = safe_getattr (obj, i)
  50. else:
  51. obj obj = obj[i]
  52. return obj, first
  53.    
  54. def safe_getattr(obj, attr):
  55. # Expand the logic here. For instance on 2.x you will also need
  56. # to disallow func_globals, on 3.x you will also need to hide
  57. # things like cr_frame and others. So ideally have a list of
  58. # objects that are entirely unsafe to access.
  59. if attr[:1] == '_':
  60. raiseAttributeError(attr)
  61. return getattr(obj, attr)
  62.    
  63. def safe_format(_string, *args, **kwargs):
  64. formatter = SafeFormatter ()
  65. kwargs = MagicFormatMapping (args, kwargs)
  66. return formatter.vformat(_string, args, kwargs)

Now, we can use the safe_format method instead of str.format:

  1. > > > '{0.__class__}'.format(42)
  2. " < type 'int' > "
  3. > > > safe_format('{0.__class__}', 42)
  4. Traceback (most recent call last):
  5. File " < stdin > ", line 1, in < module >  
  6. AttributeError: __class__

summary

In this article, we conducted an in-depth analysis of the security vulnerabilities of a new syntax for formatting strings introduced by Python, and provided corresponding security solutions, hoping that it will be helpful to readers.

<<:  I don't know the router's address.

>>:  Accelerate 5G research and development to reduce network charges

Recommend

Eight use cases for NV overlay

Most IT organizations are under pressure to be mo...

Internet Service Providers, how much do you know about the terminology?

To fully understand the network and its capabilit...

TripodCloud: US CN2 GIA line VPS with large hard disk $40.99/half year onwards

TripodCloud (Yunding Network) is a relatively low...

Japan's strategy to compete for world 6G technology

In March of this year, when the COVID-19 epidemic...

Ericsson and Swisscom sign standalone 5G network agreement

Ericsson and Swisscom have signed an expanded 5G ...

How is IPv6 represented? How is IPv4 converted to IPv6?

IPv6 has been gradually applied, and now many ope...

Use Qt to create your own serial port debugging assistant

[[376484]] In my work, the thing I deal with most...

SD-WAN vs. SASE? No!

The canonical definition of SASE includes five fu...