Skip to content

Fix support for non-English texts#49

Open
omid-jf wants to merge 1 commit into
s:masterfrom
omid-jf:master
Open

Fix support for non-English texts#49
omid-jf wants to merge 1 commit into
s:masterfrom
omid-jf:master

Conversation

@omid-jf

@omid-jf omid-jf commented Mar 29, 2021

Copy link
Copy Markdown

The encode('ascii', 'ignore').decode('ascii') strategy does not work for non-English characters. Since emoji regex patterns already exist in defines.py, regex substitute is sufficient to remove the emojis.

Fixes #47 and #48

The encode('ascii', 'ignore').decode('ascii') strategy does not work for non-English characters. Since emoji regex patterns already exist in defines.py, regex substitute is sufficient to remove the emojis.
@omid-jf

omid-jf commented Mar 29, 2021

Copy link
Copy Markdown
Author

The pattern defined in defines.py does not contain newer emojis though and needs to be updated.
emoji.get_emoji_regexp() from https://pypi.org/project/emoji can be used instead as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encoding issue with non-English text

1 participant