Conversation
Signed-off-by: Adam Korczynski <adam@adalogics.com>
gpshead
left a comment
There was a problem hiding this comment.
It feels like you maybe had an LLM generate these and accepted its categorization plan that focused on a mix of limiting fuzz_ module size combined with aiming to limit the number generated more so than considering which things actually made sense together. No problem with LLM use if so - it makes a ton of sense to use one to generate these. But I think they need some rearranging for sensibility reasons.
I'm not going to be able to review them in detail, this is the kind of code for the kind of purpose where i'm more likely to trust that it does what it claims within the fuzzing sandbox environment it'll run in, and won't dive in on its implementation details for a given test scenario unless it starts producing failure results that do not make sense.
My review attitude on this does leave opportunity for things to not be doing useful fuzzing vs the compute time they're given without further detailed consideration. But is intended to unblock.
| // | ||
| // All module functions and constructors are imported once during init and | ||
| // cached as static PyObject* pointers. PyRef (RAII) prevents reference leaks. | ||
| // PyGC_Collect() runs every 200 iterations. Max input size: 1 MB. |
There was a problem hiding this comment.
This is probably wasting a lot of time using such large input sizes.
| // via PyRun_String at init time and cached as class objects. | ||
| // | ||
| // PyRef (RAII) prevents reference leaks. PyGC_Collect() runs every 200 | ||
| // iterations. Max input size: 1 MB. |
There was a problem hiding this comment.
Also probably wasting a lot of time with large data inputs.
There was a problem hiding this comment.
rename this to parser_modules, parsers implies the Python language parser itself.
There was a problem hiding this comment.
this one is lumping quite unrelated things within one fuzzer. compression, binascii, pickle, ssl, and some codecs - each of those are their own topic.
There was a problem hiding this comment.
sqlite3 and dbm as extension modules wrapping third party libraries belong in a different category from the others.
There was a problem hiding this comment.
again, wildly unrelated things that don't fit under a single name umbrella.
This PR adds 6 fuzzers that call functions and methods of different python modules, specifically crypto, data operations, IO-related operations, text operations and different parsing and decoding routines. The high level approach is that each fuzzer chooses one or multiple targets to call. It will only call multiple targets if it can chain together different methods and in such case the fuzzer will choose both the initial API and the subsequent chained methods in a pseudo-random manner.
Fuzzing wise, they seem to be doing alright and unlock quite a bit of new coverage in cpython. From about a 20 minute run of each of these fuzzers, they generated the following coverage in
./Modules:... which is a lot more than the current coverage which is 7329 lines covered out of 52105 as of today. I expect that once they get onboarded onto the OSS-Fuzz build, they will reach higher coverage.
cc @ammaraskar @gpshead for info