-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An Ubjson Parsing Problem Can Easily Cause DDoS Attack. #2793
Comments
Do I understand your issue right that the "attack" would be parsing a UBJSON array that is encoded like "an array with 100000 null values" which, due to the compact encoding indeed requires only little bytes on the UBJSON side, but yields a large overhead when represented as JSON? I think using |
You get it. In reality, there should not be such a large difference between the two formats. After all, parsing just 10 characters requires over 35 gigabytes memory usage. This is not the worst case. Since my server has 64G RAM, I only use "[$Z#l1~~~~" as the test case, otherwise the program would be killed before finish due to the lack of memory. If the input was modified to "[$Z#l~~~~", then it is possible that these 10 bytes would consume over 100G memory. Possible solutions now might be:
|
I'm not sure whether adding code for such special cases is a good idea as it adds complexity for very artificial inputs. |
Thanks for your reply! Obviously, the solution is not limited to the two options I suggested above. As the creator of this repository, you have a much better understanding of this work than I do, and I am confident that you can find a solution that balances complexity and security. In addition, I still want to emphasise two points on this issue and wish to get the support from you.
Looking forward to your response and repair. |
I just tried py-ubjson which shows the same behavior: import ubjson
encoded = b'[$Z#L\x00\x00\x00\x00\xff\xff\xff\xff'
decoded = ubjson.loadb(encoded) (Python allocates some 30 GB of memory). |
I don't think we should ignore the problem just because
The backend server would be DOS due to the huge memory cost and high cpu utilization. I think the second solution mentioned by @spidermana to this problem is feasible. We can do better by using some kinds of marking in |
I would be happy to see a proposal, but it is, in my opinion, not so easy. The library is designed to store arbitrary JSON, and your proposal is to change the storage to simplify storing very specialized use cases. |
Based on your suggestion, I have also tried The test case('[$Z#L\x00\x00\x00\x00\xff\xff\xff\xff') you provided is actually different from the one I used before('[$Z#l\x31\x7E\x7E\x7E'). You should compare the difference between the two under the same input. For the '[$Z#L\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff' input case, the current But for the test case '[$Z#l1~~~', i.e. '[$Z#l\x31\x7E\x7E\x7E', the current The worse the input data, the greater the difference between In fact, the attacker will not use arbitrary input, but instead make use of very specialized input, like I mentioned above This issue should not be underestimated. I sincerely hope you can solve this problem to improve the security. |
I haven't done this myself, just speculating. Would it be possible for the caller to create their own class derived from the |
Using |
A workaround is to provide a low-cost helper/utility function that calculates the effective size of UBJSON raw input without actually constructing it. A simple metric as total number of elements seems easy to implement and probably more than enough for rejecting such malicious inputs. Based on use-case, the user can decide to call this function for the given input and put a hard limit on the number of elements as a security measure. I'm not familiar with UBJSON format so it would be very helpful to hear your feedback and on viability of this approach. |
This was probably #1788 with a fix in #1436. This fix made If I understand this issue here correctly this is about preventing parsing input which is small (in terms of memory size) on the attacker side but large on the json library side, or? |
Your understanding is right. Besides, the @nickaein solution prevents malicious input and is easy to implement, but may lose some of its usability. Each has both pros and cons. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
…2793 and hampered readability
…2793 and hampered readability
…2793 and hampered readability
…2793 and hampered readability
* support UBJSON-derived Binary JData (BJData) format * fix Codacy warning * partially fix VS compilation errors * fix additional VS errors * fix more VS compilation errors * fix additional warnings and errors for clang and msvc * add more tests to cover the new bjdata types * add tests for optimized ndarray, improve coverage, fix clang/gcc warnings * gcc warn useless conversion but msvc gives an error * fix ci_test errors * complete test coverage, fix ci_test errors * add half precision error test * fix No newline at end of file error by clang * simplify endian condition, format unit-bjdata * remove broken test due to alloc limit * full coverage, I hope * move bjdata new markers from default to the same level as ubjson markers * fix ci errors, add tests for new bjdata switch structure * make is_bjdata const after using initializer list * remove the unwanted assert * move is_bjdata to an optional param to write_ubjson * pass use_bjdata via output adapter * revert order to avoid msvc 2015 unreferenced formal param error * update BJData Spect V1 Draft-2 URL after spec release * amalgamate code * code polishing following @gregmarr's feedback * make use_bjdata a non-default parameter * fix ci error, remove unwanted param comment * encode and decode bjdata ndarray in jdata annotations, enable roundtrip tests * partially fix ci errors, add tests to improve coverage * polish patch to remove ci errors * fix a ndarray dim vector condition * fix clang tidy error * add sax test cases for ndarray * add additional sax event tests * adjust sax event numbering * fix sax tests * ndarray can only be used with array containers, discard if used in object * complete test coverage * disable [{SHTFNZ in optimized type due to security risks in #2793 and hampered readability * fix ci error * move OutputIsLittleEndian from tparam to param to replace use_bjdata * fix ci clang gcc error * fix ci static analysis error * update json_test_data to 3.1.0, enable file-based bjdata unit tests * fix stack overflow error on msvc 2019 and 2022 * use https link, update sax_parse_error after rebase * make input_format const and use initializer * return bool for write_bjdata_ndarray * test write_bjdata_ndarray return value as boolean * fix ci error
What is the issue you have?
In the process of parsing ubjson and converting it to json with the
from_ubjson
function, parsing an extremely long array of null characters (unlike other characters, null characters do not need to be supplied by the user in the input) can lead to a “dead” loop or loop explosion problem.In my initial experiments, taking just 10 characters as input can cause the function
from_ubjson
to execute for over 150s (even with -O3 enabled) and use over 35G of memory in my server. Besides, the effect of such an attack is cumulative. That is, if this 10-character string is copiedn
times, it can generates150n
seconds of execution time, and35n
G memory usage.Any host that uses the correlation function is vulnerable to a extremely serious DoS attack for the unreasonably high usage of CPU and memory resources
The stack traces and error messages are as follows:
Please describe the steps to reproduce the issue.
2. Code to parse the ubjson from here.
3. Compile using command:
g++ -O3 -std=c++11 -g -I <JSON_SRC_PATH>/single_include ./test_ubjson.cpp
Run
a.out
and check the CPU and memory usage.Can you provide a small but working code example?
Please see above.
What is the expected behavior?
It should parse null very quickly for very small inputs.
And what is the actual behavior instead?
The
from_ubjson
function has extremely high memory and CPU usage when parsing pure null characters with large length fields in UBJSON format.In UBJSON, for null characters, the user does not need to actually enter the character.
However, according to ubjson's compression logic, the user only needs to provide a very large length field of the null character when parsing, which can make
from_ubjson
take a long time to finish.json/single_include/nlohmann/json.hpp
Lines 10255 to 10261 in e10a3fa
In practice, the loop for
get_ubjson_array
should be optimized (usingresize()
etc. to speed it up), or a special structure should be used to record the number and the position of null character, rather than just pushing it into memory, and only rewrite the answer to the standard output directly when converting to the json format.Which compiler and operating system are you using?
Which version of the library did you use?
develop
branchIf you experience a compilation error: can you compile and run the unit tests?
The text was updated successfully, but these errors were encountered: