Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ExtractGrokPatterns to OTTL converters #34037

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Jul 11, 2024

Description:
Added converter to OTTL for parsing grok patterns

Link to tracking Issue: #32593

Testing:
added unit tests, e2e test

for manual test use this config

receivers:
  filelog:
    include: [ demo.log ]
    start_at: beginning

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 10000
    sampling_thereafter: 10000

processors:
  transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements: 
          - merge_maps(attributes, ExtractGrokPatterns(body, "%{WOOHOO}", true, ["WOOHOO=%{ELB_URI} otel"]), "insert")



service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [transform ]
      exporters:
        - debug

add this line to demo.log

http://user:[email protected]:80/path?query=string otel

Output should contain these attributes:

Attributes:
     -> log.file.name: Str(demo.log)
     -> url.username: Str(user)
     -> url.domain: Str(example.com)
     -> url.port: Int(80)
     -> url.path: Str(/path)
     -> url.query: Str(query=string)
     -> url.scheme: Str(http)

For default set of patterns check: http://user:[email protected]:80/path?query=string
This implementation uses a complete set defined in this directory:
https://github.com/elastic/go-grok/tree/main/patterns

%{ELB_URI} comes from AWS set and is equivalent to
((?P<url.scheme>[A-Za-z][A-Za-z0-9+\.-]+)://(?:(?P<url.username>([a-zA-Z0-9._-]+))(?::[^@]*)?@)?(?:((?P<url.domain>(?:((?:(((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?)|((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))))|(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b))))(?::(?P<url.port>\b[1-9][0-9]*\b))?))?(?:((?P<url.path>(/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]+)+)(?:\?(?P<url.query>[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*))?))?)

Documentation:
updated ottl/readme

pkg/ottl/ottlfuncs/README.md Outdated Show resolved Hide resolved
pkg/ottl/ottlfuncs/README.md Outdated Show resolved Hide resolved
pkg/ottl/ottlfuncs/func_extract_grok_patterns.go Outdated Show resolved Hide resolved

The `ExtractGrokPatterns` Converter returns a `pcommon.Map` struct that is a result of extracting named capture groups from the target string. If no matches are found then an empty `pcommon.Map` is returned.

`target` is a Getter that returns a string. `pattern` is a grok pattern string. `namedCapturesOnly` specifies if non-named captures should be returned. `patternDefinitions` is a list of custom pattern definition strings used `pattern` in a form of `PATTERN_NAME=PATTERN`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a typo here:

custom pattern definition strings used `pattern` in a form of `PATTERN_NAME=PATTERN`

But I'm not sure because I don't fully understand what patternDefinitions does yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more in depth docs around what the optional parameters do and when a user would use them?

pkg/ottl/ottlfuncs/func_extract_grok_patterns.go Outdated Show resolved Hide resolved
pkg/ottl/ottlfuncs/func_extract_grok_patterns.go Outdated Show resolved Hide resolved
Comment on lines 599 to 607
Use `patternDefinition` to improve readability when extracted `pattern` is not part of the default set or you need custom naming.
For example to parse password from `/etc/passwd` and keep `pattern` readable:
- `pattern`: `%{USERNAME:user.name}:%{PASSWORD:user.password}:%{USERINFO}`
- `patternDefinitions` as `USERNAME` is in default set:
- `PASSWORD=%{WORD}`
- `USERINFO=%{GREEDYDATA}`
- `smith:x:1001:1000:J Smith,1234,(234)567-8910,(234)567-1098,email:/home/smith:/bin/sh` resulting in:
- `user.name`: smith
- `user.password`: pass123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty confused about what patternDefinitions is doing. Where did the value pass123 come from? I don't see it in the example string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's second part of the string, my bad there was x previously

Copy link
Member

@TylerHelmuth TylerHelmuth Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what is

  - `PASSWORD=%{WORD}`
  - `USERINFO=%{GREEDYDATA}

providing in this scenario? It looks to me like %{USERNAME:user.name}:%{PASSWORD:user.password} is doing all the work

Copy link
Contributor Author

@michalpristas michalpristas Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%{USERNAME:user.name}:%{PASSWORD:user.password} is a pattern so it is a format in which your logs occur in a file.

  - `PASSWORD=%{WORD}`
  - `USERINFO=%{GREEDYDATA}

is a pattern definition, it tells parser how %{USERNAME:user.name} translates to reges, in this case it is a WORD that comes from a default set and is \b\w+\b(definition)

s after compilation you end up with named capture <user.name> capturing regex \b\w+\b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked very nice people from our docs team to help me rephrase this so it is more clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/otelcontribcol otelcontribcol command pkg/ottl
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants