Skip to content

[R] Bindings for stringr::str_to_sentence #29256

@asfimport

Description

@asfimport

There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:

  • capitalise the first word

  • if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.

    The stringr implementation wraps stringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence stringr::str_to_title() is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the utf8_capitalize kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of stringr::str_to_sentence().

    For more extensive discussions around the stringi / stringr implementation see stringr issues 202 and 231.

    Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.

Reporter: Nicola Crane / @thisisnic
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-13615. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions