- Notifications
You must be signed in to change notification settings - Fork 788
Huge text handling #3121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Huge text handling #3121
Conversation
this actually looks good, I just quickly skimmed through maybe only one concern, I think we should WARN instead of FINE print if a file is skipped because of limits ... (unless FINE is printed by default to log file ... but then I'd like to see those as WARN on console too ... ) |
What happens if index is created with particular limits and then the limits are changed ? |
No file is skipped. It is still included but under the |
I tried to describe that above, but to clarify: You can change If you change
|
sorry, I meant "trimmed" down, not skipped |
@tarzanek , that's done. |
I suppose it would be straight-forward to store a value for uncompressed size in the |
Oh but that would mean decompressing entirely. Probably not a good idea. |
5dca6b3
to 1d15e51
Compare cdcbea3
to 7559eb7
Compare Just rebased on master since this needed revision to accommodate the |
I will take a look; also needs rebase. |
7559eb7
to 8ae1950
Compare Just trivial conflicts upon rebase |
1b51b08
to 2fb1b2e
Compare 5403dfd
to 29aad0e
Compare Just rebasing for trivial conflicts related to R analyzer and then again after parallel detection merged |
opengrok-indexer/src/main/java/org/opengrok/indexer/index/Indexer.java Outdated Show resolved Hide resolved
opengrok-indexer/src/test/java/org/opengrok/indexer/index/HugeTextTest.java Outdated Show resolved Hide resolved
29aad0e
to 358d2f6
Compare Rebased for trivial conflict in search.jsp |
Also, move some logic properly to AnalyzerGuru that had crept into IndexDatabase.
f6bdc40
to 36245a5
Compare Rebased for PageConfig.java re-lo, and git automatic-merge took care of it |
Hello,
Please consider for integration this patch to add Huge Text file handling.
Indexer
andConfiguration
get two new settings,hugeTextThresholdBytes
(default 1_000_000) andhugeTextLimitCharacters
(default 5_000_000). The threshold determines when OpenGrok will override aPLAIN
genre file as ahugetext
DATA
file instead. The character limit determines how much to read and index forhugetext
(with contextless truncation); the limit may be zero.hugeTextThresholdBytes
is checked for applicable files with each run, while no state forhugeTextLimitCharacters
is stored. ChanginghugeTextLimitCharacters
after indexing would require touching affected source code files to revise the index.For affected gzip and bzip2 files, changes to either
hugeTextThresholdBytes
orhugeTextLimitCharacters
would require touching affected compressed files to revise the index.Thank you.