Conversation
lib/docsplit/image_extractor.rb
Outdated
There was a problem hiding this comment.
Perhaps this return type should be called "image_paths" instead of "images"?
|
Fixed inaccurate text in body of pull request. Just to be clear I modified the return value behavior for every case I could find that was appropriate, causing functions to return an array of file paths to the extracted data rather than the intermediate PDF, as it was previously. |
|
Any idea when this might get merged in? This is a great feature. |
lib/docsplit/text_extractor.rb
Outdated
There was a problem hiding this comment.
Thanks for the input. I was misinformed about the nature of Array()
|
👍 |
|
Is there anything one can do to help getting this merged? It is rather messy to write code around extract_images to determine the generated file names based on "reverse engineering" the naming scheme and doing string manupulations. Thanks! |
|
+1 @dmayer's commet. I also think it would useful to have a similar feature for extract_pages as well so one can easily determine the generate file names when a PDF is split into individual pages. Cheers |
|
ping @knowtheory @jashkenas |
|
If you're interested I did a ruby gem for this : |
|
I agree with @bridgway, would be useful on @knowtheory, what are your thoughts on this? |
|
Would love to see this merged. |
Cleaning up code Flatten the return value Make it not add the path to the return value if an exception-worthy event occurred. Instead, merely raise that exception Make text_extractor also return paths to processed files Make function extract_images always return array of image paths Refine specs Fix tests Add nil check Refactor tests to better isolate functionality remove debugger remove logger Add printf debugging Sanity checking Printfs Remove puts Remove annoying line Cleanup Fix unnecessary usage of ternary operation to 'wrap' an Array and replaced with Array() as it is more idiomatic revert to original
aea6533 to
9789dd5
Compare
|
@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer |
|
Thank you! :) Regards
|
This fixes an issue I opened on the main repo found at #62
The feature proposed adds a new option to extracting images via Docsplit.extract_images. If you pass an additional argument in the form :and_return => :images, you receive an array of the paths of extracted images as a return value, instead of the path to the intermediate PDF, in the case of powerpoints.
Example:
With a return value of:
The default return-value behavior of this method is preserved when that option is not specified, which is to return an array of PDFs that are returned by ensure_pdfs.
The justification for this feature is that I feel like users would benefit from getting the paths to the images created immediately upon calling the extract_images method. Having them determine these paths themselves seems out of spirit with the nature of this gem.