-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ExtractGrokPatterns to OTTL converters #34037
base: main
Are you sure you want to change the base?
Conversation
…tor-contrib into ottl/grok
…llector-contrib into ottl/grok
…tor-contrib into ottl/grok
Co-authored-by: Tyler Helmuth <[email protected]>
Co-authored-by: Tyler Helmuth <[email protected]>
…tor-contrib into ottl/grok
pkg/ottl/ottlfuncs/README.md
Outdated
|
||
The `ExtractGrokPatterns` Converter returns a `pcommon.Map` struct that is a result of extracting named capture groups from the target string. If no matches are found then an empty `pcommon.Map` is returned. | ||
|
||
`target` is a Getter that returns a string. `pattern` is a grok pattern string. `namedCapturesOnly` specifies if non-named captures should be returned. `patternDefinitions` is a list of custom pattern definition strings used `pattern` in a form of `PATTERN_NAME=PATTERN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a typo here:
custom pattern definition strings used `pattern` in a form of `PATTERN_NAME=PATTERN`
But I'm not sure because I don't fully understand what patternDefinitions does yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some more in depth docs around what the optional parameters do and when a user would use them?
pkg/ottl/ottlfuncs/README.md
Outdated
Use `patternDefinition` to improve readability when extracted `pattern` is not part of the default set or you need custom naming. | ||
For example to parse password from `/etc/passwd` and keep `pattern` readable: | ||
- `pattern`: `%{USERNAME:user.name}:%{PASSWORD:user.password}:%{USERINFO}` | ||
- `patternDefinitions` as `USERNAME` is in default set: | ||
- `PASSWORD=%{WORD}` | ||
- `USERINFO=%{GREEDYDATA}` | ||
- `smith:x:1001:1000:J Smith,1234,(234)567-8910,(234)567-1098,email:/home/smith:/bin/sh` resulting in: | ||
- `user.name`: smith | ||
- `user.password`: pass123 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still pretty confused about what patternDefinitions is doing. Where did the value pass123
come from? I don't see it in the example string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's second part of the string, my bad there was x previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is
- `PASSWORD=%{WORD}`
- `USERINFO=%{GREEDYDATA}
providing in this scenario? It looks to me like %{USERNAME:user.name}:%{PASSWORD:user.password}
is doing all the work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%{USERNAME:user.name}:%{PASSWORD:user.password}
is a pattern so it is a format in which your logs occur in a file.
- `PASSWORD=%{WORD}`
- `USERINFO=%{GREEDYDATA}
is a pattern definition, it tells parser how %{USERNAME:user.name}
translates to reges, in this case it is a WORD
that comes from a default set and is \b\w+\b
(definition)
s after compilation you end up with named capture <user.name> capturing regex \b\w+\b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked very nice people from our docs team to help me rephrase this so it is more clear
…tor-contrib into ottl/grok
…tor-contrib into ottl/grok
…tor-contrib into ottl/grok
Description:
Added converter to OTTL for parsing grok patterns
Link to tracking Issue: #32593
Testing:
added unit tests, e2e test
for manual test use this config
add this line to
demo.log
Output should contain these attributes:
For default set of patterns check: http://user:[email protected]:80/path?query=string
This implementation uses a complete set defined in this directory:
https://github.com/elastic/go-grok/tree/main/patterns
%{ELB_URI}
comes from AWS set and is equivalent to((?P<url.scheme>[A-Za-z][A-Za-z0-9+\.-]+)://(?:(?P<url.username>([a-zA-Z0-9._-]+))(?::[^@]*)?@)?(?:((?P<url.domain>(?:((?:(((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))))|(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))))(?::(?P<url.port>\b[1-9][0-9]*\b))?))?(?:((?P<url.path>(/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]+)+)(?:\?(?P<url.query>[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*))?))?)
Documentation:
updated ottl/readme