Fix issue 62 by hderms · Pull Request #63 · documentcloud/docsplit

hderms · 2012-11-08T20:54:38Z

This fixes an issue I opened on the main repo found at #62

The feature proposed adds a new option to extracting images via Docsplit.extract_images. If you pass an additional argument in the form :and_return => :images, you receive an array of the paths of extracted images as a return value, instead of the path to the intermediate PDF, in the case of powerpoints.
Example:

Docsplit.extract_images('/tmp/some_ppt.ppt', :size => '1000x', :format => [:png, :jpg])

With a return value of:

["/tmp/some_ppt.png", "/tmp/some_ppt.jpg"]

The default return-value behavior of this method is preserved when that option is not specified, which is to return an array of PDFs that are returned by ensure_pdfs.

The justification for this feature is that I feel like users would benefit from getting the paths to the images created immediately upon calling the extract_images method. Having them determine these paths themselves seems out of spirit with the nature of this gem.

KurtPreston · 2012-11-12T20:45:18Z

lib/docsplit/image_extractor.rb

Perhaps this return type should be called "image_paths" instead of "images"?

hderms · 2013-02-22T16:48:04Z

Fixed inaccurate text in body of pull request. Just to be clear I modified the return value behavior for every case I could find that was appropriate, causing functions to return an array of file paths to the extracted data rather than the intermediate PDF, as it was previously.

KurtPreston · 2013-03-27T20:37:41Z

Any idea when this might get merged in? This is a great feature.

antoinelyset · 2013-06-03T15:06:08Z

lib/docsplit/text_extractor.rb

You can do :

pdfs = Array(pdfs)

Thanks for the input. I was misinformed about the nature of Array()

antoinelyset · 2013-06-03T15:06:41Z

👍

dmayer · 2013-08-16T15:27:00Z

Is there anything one can do to help getting this merged? It is rather messy to write code around extract_images to determine the generated file names based on "reverse engineering" the naming scheme and doing string manupulations. Thanks!

bridgway · 2013-08-16T15:49:46Z

+1 @dmayer's commet. I also think it would useful to have a similar feature for extract_pages as well so one can easily determine the generate file names when a PDF is split into individual pages. Cheers

sandstrom · 2013-12-10T13:28:36Z

ping @knowtheory @jashkenas

antoinelyset · 2013-12-10T17:38:28Z

If you're interested I did a ruby gem for this :

https://github.com/antoinelyset/poleica

sandstrom · 2014-06-26T08:45:21Z

I agree with @bridgway, would be useful on extract_pages too.

@knowtheory, what are your thoughts on this?

steverob · 2016-04-05T18:55:39Z

Would love to see this merged.

Cleaning up code Flatten the return value Make it not add the path to the return value if an exception-worthy event occurred. Instead, merely raise that exception Make text_extractor also return paths to processed files Make function extract_images always return array of image paths Refine specs Fix tests Add nil check Refactor tests to better isolate functionality remove debugger remove logger Add printf debugging Sanity checking Printfs Remove puts Remove annoying line Cleanup Fix unnecessary usage of ternary operation to 'wrap' an Array and replaced with Array() as it is more idiomatic revert to original

hderms · 2016-04-05T19:08:32Z

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer

steverob · 2016-04-05T20:05:10Z

Thank you! :)

Regards
Steve Robinson

On 06-Apr-2016, at 12:38 AM, Dermot Haughey notifications@github.com wrote:

@steverob I just squashed the old commits and will rebase against master in preparation for reconsideration by the maintainer

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

KurtPreston reviewed Nov 12, 2012
View reviewed changes

lib/docsplit/image_extractor.rb Outdated

Copy link

KurtPreston Nov 12, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this return type should be called "image_paths" instead of "images"?

antoinelyset reviewed Jun 3, 2013
View reviewed changes

sandstrom mentioned this pull request Jun 26, 2014

Fix for Issue #83: Leading Zeros #97

Open

hderms force-pushed the fix_issue_62 branch 2 times, most recently from aea6533 to 9789dd5 Compare April 5, 2016 19:08

Conversation

hderms commented Nov 8, 2012

Uh oh!

KurtPreston Nov 12, 2012

Choose a reason for hiding this comment

Uh oh!

hderms commented Feb 22, 2013

Uh oh!

KurtPreston commented Mar 27, 2013

Uh oh!

antoinelyset Jun 3, 2013

Choose a reason for hiding this comment

Uh oh!

hderms Jun 3, 2013

Choose a reason for hiding this comment

Uh oh!

antoinelyset commented Jun 3, 2013

Uh oh!

dmayer commented Aug 16, 2013

Uh oh!

bridgway commented Aug 16, 2013

Uh oh!

sandstrom commented Dec 10, 2013

Uh oh!

antoinelyset commented Dec 10, 2013

Uh oh!

sandstrom commented Jun 26, 2014

Uh oh!

steverob commented Apr 5, 2016

Uh oh!

hderms commented Apr 5, 2016

Uh oh!

steverob commented Apr 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants