-
- Notifications
You must be signed in to change notification settings - Fork 81
supports a "segment_prefix" in the edi parser file declaration #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| First thank you very much for using omniparser, analyzing your issue and proposing a solution! Really appreciate it! In general, I'd like to have an issue opened for in depth discussion, before PR is determined to be needed and created. I'd like to understand a bit more of your specific problem: EDI uses segment delimiter and element delimiter to compartmentalize data fragments. In your example however, I'm not seeing element delimiter. Is your EDI has only seg delim? That's highly unusual (not even sure if we should call it EDI format or any more). Or is it your EDI really uses Given the guess/analysis above, I have three solutions in mind:
Given this very very non-standard EDI structure in your situation, I'm a bit inclined toward option 2 since that What do you think, @samolds ? |
| Sorry for jumping the gun and opening a PR straight away! I had already gone ahead with the changes in order to meet a demo deadline and thought I would try and contribute back, even if this is some weird EDI flavor. I have a specification that states:
And then proceeds to define all of the segments and elements and loops and various rules for the expected data types, min, max, format checks, etc. Following through your EDI In Depth wiki, I created an EDI parsing spec (simplified here for brevity) that successfully transforms the raw data: { "parser_settings": { "version": "omni.2.1", "file_format_type": "edi" }, "file_declaration": { "element_delimiter": "|", "segment_prefix": "|", "segment_delimiter": "\n", "ignore_crlf": false, "segment_declarations": [ { "name": "document", "type": "segment_group", "min": 1, "max": -1, "is_target": true, "child_segments": [ { "name": "DOC_TYPE", "elements": [ { "name": "name", "index": 1 }, { "name": "timestamp", "index": 3 }, { "name": "version", "index": 8 } ] }, { "name": "record", "type": "segment_group", "min": 1, "max": -1, "child_segments": [ { "name": "REC", "elements": [ { "name": "record_id", "index": 2 } ] }, { "name": "header", "type": "segment_group", "min": 1, "max": 1, "child_segments": [ { "name": "HDR", "elements": [ { "name": "part_number", "index": 1 } ] } ] } ] }, { "name": "EOF" } ] } ] }, "transform_declarations": { "FINAL_OUTPUT": { "object": { "name": { "xpath": "DOC_TYPE/name" } }} } }So yes, my spec uses I'm cool with option # 2. I will try and find some time to make those changes, but I might have to continue using my fork until I get around to it. |
…anism. LineReader implements io.Reader interface with a line editing mechanism. LineReader reads data from underlying io.Reader and invokes the caller supplied edit function for each of the line (defined as []byte ending with '\n', therefore it works on both Mac/Linux and Windows, where '\r\n' is used). Note the last line before EOF will be edited as well even if it doesn't end with '\n'. Usage is highly flexible: the editing function can do in-place editing such as character replacement, prefix/suffix stripping, or word replacement, etc., as long as the line length isn't changed; or it can replace a line with a completely newly allocated and written line with no length restriction (although performance would be slower compared to in-place editing). ios.LineReader is at least as performant as ios.BytesReplacingReader: ``` BenchmarkLineReader_RawIORead-8 23300 51319 ns/op 1103392 B/op 23 allocs/op BenchmarkLineReader_UseLineReader-8 3343 351305 ns/op 1104512 B/op 25 allocs/op BenchmarkLineReader_CompareWithBytesReplacingReader-8 978 1226656 ns/op 1107648 B/op 26 allocs/op ``` This PR is motivated from real usage case discussed in jf-tech/omniparser#154
| @samolds if you have time, do you mind taking a look at jf-tech/go-corelib#22 where I introduce a LineReader with editing mechanism, as we discussed in this PR before. Let me know if you think it would fit your need. |
| jf-tech/go-corelib#22 is a good solution. I am going to close this PR in favor of using the LineEditingReader introduced in the other PR. Thanks! |
…ng mechanism (#22) `LineEditingReader` implements `io.Reader` interface with a line editing mechanism. `LineEditingReader` reads data from underlying `io.Reader` and invokes the caller supplied edit function for each of the line (defined as `[]byte` ending with `'\n'`, therefore it works on both Mac/Linux and Windows, where `'\r\n'` is used). Note the last line before `EOF` will be edited as well even if it doesn't end with `'\n'`. Usage is highly flexible: the editing function can do in-place editing such as character replacement, prefix/suffix stripping, or word replacement, etc., as long as the line length isn't increased; or it can replace a line with a completely newly allocated and written line with no length restriction (although performance might be slower compared to in-place editing). `ios.LineEditingReader` is at least as performant as `ios.BytesReplacingReader`: ``` BenchmarkLineEditingReader_RawIORead-8 23300 51319 ns/op 1103392 B/op 23 allocs/op BenchmarkLineEditingReader_UseLineEditingReader-8 3343 351305 ns/op 1104512 B/op 25 allocs/op BenchmarkLineEditingReader_CompareWithBytesReplacingReader-8 978 1226656 ns/op 1107648 B/op 26 allocs/op ``` This PR is motivated from real usage case discussed in jf-tech/omniparser#154
| FYI, @samolds https://github.com/jf-tech/go-corelib v0.0.16 that contains the |
I'm working with a non-standard EDI format that includes a segment prefix. For example, a message might be:
where every segment begins with a pipe. I thought that I could get around this by making the segment delimiter include the next pipe (ie
|\n|), but this doesn't catch the very first pipe.I propose including a new (optional) "segment_prefix" field in the file_declaration to catch segment prefixes.