Initial commit for dissect.executable.pe#10
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10 +/- ##
==========================================
+ Coverage 0.00% 88.60% +88.60%
==========================================
Files 5 17 +12
Lines 306 1439 +1133
==========================================
+ Hits 0 1275 +1275
+ Misses 306 164 -142
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Miauwkeru
left a comment
There was a problem hiding this comment.
I have done an initial review of the code, mostly just commented on some typehint weirdness and the like. I'll go deeper into it at a later time.
| self.patched_pe = BytesIO() | ||
| self.functions = [] | ||
|
|
||
| @property |
There was a problem hiding this comment.
This being a property/attribute doesn't make much sense. I wouldn't expect assigning the attribute to some other variable would build the whole patched binary.
There was a problem hiding this comment.
I think that's just a design preference, happy to discuss other options for this
There was a problem hiding this comment.
depends on what you expect of a property, most properies just return a value of some internal state. This goes a bit beyond that IMHO, where it produces the whole binary. Feels like it should be the responsibility of a method instead.
| self.raw_resources.append( | ||
| { | ||
| "offset": entry.OffsetToDirectory, | ||
| "entry": data_entry, | ||
| "data": data, | ||
| "data_offset": raw_offset, | ||
| "resource": rsrc, | ||
| } | ||
| ) |
There was a problem hiding this comment.
Instead of having an additional raw_resources list, why not add these offset and information to Resource itself? Where you can have the same functionality as a dict if you add def raw() to Resource
There was a problem hiding this comment.
This is kept for patching resources, adding them as a dict like proposed would need some more extensive refactoring of the code. We might opt for that indeed as it would be cleaner
| for rsrc_entry in sorted(self.pe.raw_resources, key=lambda rsrc: rsrc["data_offset"]): | ||
| entry_offset = rsrc_entry["offset"] | ||
| entry = rsrc_entry["entry"] | ||
|
|
||
| if entry._type.name == "IMAGE_RESOURCE_DATA_ENTRY": | ||
| rsrc_obj = rsrc_entry["resource"] | ||
| data_offset = rsrc_entry["data_offset"] | ||
|
|
||
| # Normally the data is separated by a null byte, increment the new offset by 1 | ||
| new_data_offset = prev_offset + prev_size | ||
| # if new_data_offset and (new_data_offset > data_offset or new_data_offset < data_offset): | ||
| if new_data_offset and new_data_offset != data_offset: | ||
| data_offset = new_data_offset | ||
| rsrc_entry["data_offset"] = data_offset | ||
| rsrc_obj.offset = self.section.virtual_address + data_offset | ||
|
|
||
| data = rsrc_obj.data | ||
|
|
||
| # Write the resource entry data into the section | ||
| section_data.seek(data_offset) | ||
| section_data.write(data) | ||
|
|
||
| # Take note of the offset and size so we can update any of these values after changing the data within | ||
| # the resource | ||
| prev_offset = data_offset | ||
| prev_size = rsrc_obj.size | ||
|
|
||
| # Write the resource entry into the section | ||
| section_data.seek(entry_offset) | ||
| section_data.write(entry.dumps()) |
There was a problem hiding this comment.
This feels out of place in the Resource class. I wouldn't imagine that everything gets immediately changed when assigning data to a resource. I would expect either the manager to take care of that (which holds the raw resources) or a seperate utility that does so during patching.
There was a problem hiding this comment.
Feel free to move those things around if it doesn't make sense in the place where it is at right now!
| from dissect.executable.pe.pe import PE | ||
|
|
||
|
|
||
| class PESection: |
There was a problem hiding this comment.
also a random thought, in line with the other managers and reduce a bit of the complexity of this class.
Is a SectionManager an idea that does the patching and keeping account of the patched and unpatched sections?
aka, something like:
class PESectionManager(Manager):
def __init__(self, pe, section, ...):
self.patched_items = {}
def patch(self, name, data):
# get data from self.items
# Copy data
# insert it in self.patched_items
def __getitem__(self, name: str):
if section := self.patched_items.get(name):
return section
else:
return super().__getitem__(name)
def parse():
... # Do somethingthis is just a rough draft of such a manager class
There was a problem hiding this comment.
Yeah I like the idea, initially that is kind of what I had in mind as well but I lacked (still do) some understanding of the language to do it in a nice manner.
There was a problem hiding this comment.
want me to look into that a bit?
| for name, section in self.pe.patched_sections.items(): | ||
| if section.virtual_address == prev_va: | ||
| continue | ||
|
|
||
| pointer_to_raw_data = utils.align_int(integer=prev_ptr + prev_size, blocksize=self.pe.file_alignment) | ||
| virtual_address = utils.align_int(integer=prev_va + prev_vsize, blocksize=self.pe.section_alignment) | ||
|
|
||
| if section.virtual_address < virtual_address: | ||
| """Set the virtual address and raw pointer of the section to the new values, but only do so if the | ||
| section virtual address is lower than the previous section. We want to prevent messing up RVA's as | ||
| much as possible, this could lead to binaries that are a bit larger than they need to be but that | ||
| doesn't really matter.""" | ||
| self.pe.patched_sections[name].virtual_address = virtual_address | ||
| self.pe.patched_sections[name].pointer_to_raw_data = pointer_to_raw_data | ||
|
|
||
| prev_ptr = pointer_to_raw_data | ||
| prev_size = section.size_of_raw_data | ||
| prev_va = virtual_address | ||
| prev_vsize = section.virtual_size |
There was a problem hiding this comment.
same as the Resource class, it doesn't feel like this class should be responsible for patching all the other sections.
|
to help with a lot of these style issues, you could check out the new linting/formatting we plan to use. It will probably also warn you about stuff I missed: fox-it/dissect.util#52 |
Miauwkeru
left a comment
There was a problem hiding this comment.
I answered your comments a bit, if you need help with any of the changes I requested do not hesitate to ask!
First version for the dissect PE format parser. Currently the parser supports standard PE files, as well as some patching of executables and building an executable from scratch. It is expected there will be some issues that will be found during usage and it will need quite some refactoring. The API as it currently exists might need some revising too :)
* Refactored based on comments given * Ran `ruff` for formatting * Applied style and formatting on `dissect.executable.elf` as well (hope you don't mind)
b48a82c to
325b8dd
Compare
- Moves pe high bit to import functions
Additinally remove a redundant definition
I had the wrong assumption that the directory data is the same as the data inside the section. This has been reverted for now
The data inside the section is already available, so it made sense to make it possible to get the information from there
This is mostly cause I did not expect it updating pe section when it was creating an import table
a13bec8 to
aff11da
Compare
|
The only gripe I still have is that some data is still getting overwritten. But i couldn't currently think of a better structure for that part. |
First version for the dissect PE format parser. Currently the parser supports standard PE files, as well as some patching of executables and building an executable from scratch. It is expected there will be some issues that will be found during usage and it will need quite some refactoring. The API as it currently exists might need some revising too :)