Skip to content

[Feature Proposal] Use hashformers for hashtag segmentation #53

@ruanchaves

Description

@ruanchaves

Is your feature request related to a problem? Please describe.

There is no alternative in Preprocessor to replacing hashtags by a dummy $HASHTAG$ token. This has frustrated some users who would rather like hashtags to be segmented, as evidenced by PR #43 .

Describe the solution you'd like

I propose to integrate hashformers with Preprocessor.

This would introduce hashformers as an optional dependency to the Preprocessor. Hashformers would be available as an extension through pip install tweet-preprocessor[hashformers] or pip install tweet-preprocessor[all].

Hashformers can segment hashtags in any language, and it is the current state-of-the-art in hashtag segmentation.

Describe alternatives you've considered

Hashformers has been proven by two research groups to be the current state-of-the-art for hashtag segmentation.

Additional context

If this seems like a good idea to the maintainers of this repository ( @s ), I can draft an initial PR for this feature.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions