Summary
When DLP is enabled, ordinary English words in free-form log messages are masked as username, which makes operational logs hard to read.
For example:
cleanup sessions finished count=0 becomes cl***up se****ns fi****ed co*nt=0
mounted embedded static site becomes mo***ed em****ed st**ic site
Reproduction
package main
import (
"fmt"
dlp "github.com/darkit/slog/dlp"
)
func main() {
e := dlp.NewDlpEngine()
e.Enable()
fmt.Println(e.DesensitizeText("cleanup sessions finished count=0"))
fmt.Println(e.DesensitizeText("mounted embedded static site"))
}
Actual behavior
cl***up se****ns fi****ed co*nt=0
mo***ed em****ed st**ic site
Expected behavior
These ordinary words should not be masked by default in free-form log messages.
Root cause
In v0.2.0, the default username matcher is too broad:
UsernamePattern = `[a-zA-Z0-9_]{3,16}`
That pattern matches many normal English words such as:
- cleanup
- sessions
- finished
- count
- mounted
- embedded
- static
- site
And because DesensitizeText() applies auto-detection / replacement to the whole free-form message, these words are treated as username and get masked.
Impact
- Breaks log readability
- Reduces observability during troubleshooting
- Makes DLP unsafe to enable globally for application logs
Suggested fixes
At least one of the following would help:
- Do not enable the
username matcher by default for free-form message text
- Support disabling specific matchers (for example
username) via config
- Support field-scoped DLP so only sensitive key/value fields are masked, instead of scanning the whole message body
- Tighten the default
username matcher so it does not match arbitrary common words
Workaround used downstream
We had to disable DLP entirely for runtime logs to avoid damaging normal operational output.
Summary
When DLP is enabled, ordinary English words in free-form log messages are masked as
username, which makes operational logs hard to read.For example:
cleanup sessions finished count=0becomescl***up se****ns fi****ed co*nt=0mounted embedded static sitebecomesmo***ed em****ed st**ic siteReproduction
Actual behavior
Expected behavior
These ordinary words should not be masked by default in free-form log messages.
Root cause
In
v0.2.0, the defaultusernamematcher is too broad:That pattern matches many normal English words such as:
And because
DesensitizeText()applies auto-detection / replacement to the whole free-form message, these words are treated asusernameand get masked.Impact
Suggested fixes
At least one of the following would help:
usernamematcher by default for free-form message textusername) via configusernamematcher so it does not match arbitrary common wordsWorkaround used downstream
We had to disable DLP entirely for runtime logs to avoid damaging normal operational output.