Add optional setting to expand set of indexed search characters
CompletedThis post cuts across several related search/indexing discussions such as grep, discussion of search, and the recent underscores. This keeps coming up, and I am suggesting a possible solution for searching for strings that begin with a non-alphanumeric character.
My suggestion is to add a Setting that allows a user to establish a character (or perhaps a few characters) as legal for indexing (and searching) when prepended to a string (eg, _Music or :flepen or even []write).
## Rationale
After reading the underscores post above about searching for _Music generating the same results as for Music, I am coming back to my previous comments about my "system" that uses certain non-alphanumeric characters as an initial "tag" to identify certain things, like !license or :patient or []lookup.
For context, I use these thusly:
- !foo is used to an important concept, sort of a super tag. In nvAlt, I'll also use ##foo in the same manner.
- :patient -- I keep basic patient notes in markdown (not the whole record) and leave out identifying info by using first 3 letters of first and last name preceded by a colon; so :flepen and :breter would be Fletcher and Brett.
- []lookup -- I use [] with no space in middle as a todo flag, often putting it in front of the thing needing done. To see all the todo's in April, in nvAlt I search for 202104 []. Note that this would not find the things already done, such as [x]lookup.
To repeat, I suggest adding a Setting that allows a user to establish a character (or perhaps a few characters) as legal for indexing (and searching) when prepended to a string (eg, _Music or :flepen or even []write).
A setting such as:
Extended indexed search characters (strings that begin with any combination of these characters are indexed and searchable): ]:![
would then allow such searching without unnecessarily bloating the index file. Rather than stripping out all punctuation marks, the indexing would not strip out these characters when they begin a string. So in the example above, []lookup would be indexed as [, [], []l, []lo, []loo, []look, []looku, and []lookup (if I recall your system correctly).
Adding such a setting would be worth the effort I believe, because it would answer multiple requests for something like this, not be so bloaty to index ALL punctuation, and would prevent your users from having to have a shadow system to do such searches (right now, I keep nvAlt open and use it for this purpose when Ultra won't do it).
Thoughts?
-
bump
0 -
There have been other posts here about indexing. While allowing other characters is simple, the downstream effects are wide.
I'm not particularly enthusiastic about simply adding additional "legal" characters for indexing, due to these effects. I would be much more open to a different indexing approach if someone has a good recommendation for a system that is fast (preferably O(1) in respect to document length), with compact on-disk imaging, and low memory requirements.
0
Please sign in to leave a comment.
Comments
2 comments