You are here

Index file contents - like windows desktop search or google desktop search?

25 posts / 0 new
Last post
adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
Index file contents - like windows desktop search or google desktop search?

Is there a portable app available that can index inside the files on my thumb drive?

For example, I have a thumb drive that has all of my school notes as well as textbooks on it. They're in doc, pdf, html, and chm format. I'd like to be able to do instant searches for keywords like I can with the windows desktop search.
Is there a (free) portable app that can do this?
Thanks,
Ari

digitxp
digitxp's picture
Offline
Last seen: 12 years 7 months ago
Joined: 2007-11-03 18:33
Puggle?

The first one that I've heard of (the most hopeful IMO) is Puggle, with a standalone version.
Good luck!

Insert original signature here with Greasemonkey Script.

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
OK, so I just put one together

I made an existing indexer into a portable package. It works pretty well so far.
It indexes more files than puggle, and does it faster, and has a lot more options.
And it doesn't require java.
Here's a link to download it.
Try it out and tell me what you think.

Wilma Desktop Search Portable

Thanks for all the help,
Ari

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
Here is the list of files it indexes

Here's the list of file types it indexes by default, though more can be added by the user

*.alx
*.asa
*.awx
*.bas
*.bat
*.c
*.cab
*.cdf
*.cls
*.cmd
*.cnf
*.cpp
*.csv
*.ctl
*.cxx
*.dic
*.dob
*.doc
*.docx
*.dot
*.dsm
*.dsn
*.dsp
*.dtq
*.etx
*.exc
*.frm
*.gz
*.gzip
*.h
*.hpp
*.htm
*.html
*.htt
*.htx
*.hxx
*.idc
*.inf
*.ini
*.inl
*.ipr
*.java
*.job
*.js
*.lnk
*.log
*.ls
*.man
*.map
*.mbx
*.msg
*.obd
*.obt
*.odb
*.odl
*.odp
*.ods
*.odt
*.pages
*.pdf
*.php
*.pl
*.ppt
*.rc
*.reg
*.roff
*.rtf
*.sam
*.scp
*.sh
*.src
*.stg
*.sxc
*.sxi
*.sxw
*.tar
*.tcl
*.text
*.tgz
*.tsv
*.txt
*.uin
*.url
*.vbp
*.vbz
*.vcf
*.vrml
*.wdb
*.wks
*.wpd
*.wps
*.wri
*.xla
*.xlm
*.xls
*.xlt
*.xml
*.zip

[I put your list in pre tags so people have to scroll less, mod SL]

notsure
Offline
Last seen: 10 years 2 weeks ago
Joined: 2007-02-10 13:09
Neat

Do you have the source so I can see what it's doing when it sorts? And what language is it programmed in?

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
no, I don't have the source

no, I don't have the source
see here for more info

boomerm3
Offline
Last seen: 11 years 3 months ago
Joined: 2010-10-31 11:01
Word 2007 - docx

docx is listed as an indexable file type. But, Wilma sees it as an xml (without a proper viewer). Also, I see the docx file is set to use a zip Analyzer.

Is there a way to get this to work properly? Obviously docx is not a zip file.... but, generic does not work at all.

Suggestions????

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
Now, I only packaged it for

Now, I only packaged it for PAF and did not program Wilma.
However, AFAIK, docx IS a zip file. This may be why Wilma needs to unzip it. (To see what I mean, try to change the file extension of a ".docx" file to ".zip". Unzip this file into a folder. Then you'll see the xml that the docx file is made out of. This is why a docx file is typically smaller in file-size than a doc file.)
Just a guess, but perhaps Wilma is indexing the plain text from the [unzipped] xml and perhaps Wilma just doesn't have a fancy docx parser?
Still, I've at least just been happy that Wilma does find the text I need from docx files - even if the results window doesn't display them as MS Word would.

John T. Haller
John T. Haller's picture
Online
Last seen: 9 min 34 sec ago
AdminDeveloperModeratorTranslator
Joined: 2005-11-28 22:21
Correct

DOCX is an XML-based file format that is then zipped to save file size. So Wilma would be treating it as what it is without having a full DOCX parser.

Sometimes, the impossible can become possible, if you're awesome!

boomerm3
Offline
Last seen: 11 years 3 months ago
Joined: 2010-10-31 11:01
Fixable

Without the source code, is this 'fixable' - that is can I provide a viewer that would allow Wilma to read a docx file format?

Kingston
Offline
Last seen: 14 years 2 months ago
Joined: 2010-02-01 02:05
adoucette, Your modification

adoucette,

Your modification may be just what I am looking for.

I am using Wilma at the moment with it installed on a USB HD. Craig Morris of redtree.com (source for Wilma) supplied your link in response to my inquiry to have the program write the indexing file to the USB HD rather than the default C: drive. It seems you have done this.
I tried your mod and followed your instruction and browsed to the virtual U: drive to select the folder. Unfortunately it fails to build the index and pops up a window with the message "An exception of class NilObjectException was not handled. The application must shut down".
FYI as already mentioned I'm currently using Wilma as my database search engine and it is installed on the same drive but naturally I was was not running the program from there.
Appreciate your help.

Keep up the good work

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
This happens when you make an

This happens when you make an index and click "Build" instead of "Save".
Just click "Save" when you make an index, then choose "Build" from the "Index" menu in the program.
Since this is not obvious from the way that Wilma's GUI is laid out, I've added this with some screenshots to the help file.
A newer version of the installer with this modification is available for download.

Please let me know if this fixes your problem.

Ari

Kingston
Offline
Last seen: 14 years 2 months ago
Joined: 2010-02-01 02:05
Wilmaportable

Ari,

Yes and thanks, this works. Also after creating the first index you can go straight to build, rather than save.

I remember in the original Wilma that on start up there is a initial index called 'Dummy' if I recall. On your release this file has gone and there are no indexes. It occurs to me if you reinstate this first dummy file the problem will go away.

I also found that if you ignore the dialogue box mentioned in my first post and go to the 'Console' and enter BUILD in the Dialogue pane then the system will continue and build the index.

Well done on you mod. Thanks.

Dave

PS. Have you given any thought as to what will happen if Wilma goes to the net and finds an update. Will it be possible to update Wilma without losing your mods.

Keep up the good work

gudmund
Offline
Last seen: 12 years 11 months ago
Joined: 2007-11-14 05:55
FYI, Wilma doesn't handle all characters

Installed and tried it. Using the U:\ path was the first thing I missed, but after that it appeared to work.

Until I start searching for anything containing letters like åäö etc. (which are BTW *not* "umlauts" in Swedish, but letters in their own right, competely distinct from a and o).

Gudmund

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
I didn't develop Wilma

I didn't develop Wilma, just packaged it as a portable app. I have no idea what characters it does and does not support.
However, have you considered trying regular expression searches in Wilma? You may be able to use the ASCII character numbers in a search.
See here for more info on Wilma's regex search capability
And see here for the ASCII octal and hex character values

gudmund
Offline
Last seen: 12 years 11 months ago
Joined: 2007-11-14 05:55
No critique, just informing other potential users...

...and thanks for the links!

If I hadn't reported my experience, I (and others) might have missed that info.

Gudmund

samsmy.name
Offline
Last seen: 12 years 1 week ago
Joined: 2011-01-14 08:38
Link for Wilma Portable

Hi there!

The link is not working anymore, do you still have the copy of Wilma portable?
Thanks

snunsan
Offline
Last seen: 13 years 1 month ago
Joined: 2011-01-22 07:08
Link not working

I am interested in testing this app. Please, would it be possible for you to update the file? Thank you in advance.

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
I've used Puggle portable,

I've used Puggle portable, and it works, but the portable version has some important limitations:

  • It requires the Java Runtime Environment
  • It does not allow one to specify the folders to be indexed (will index the entire drive)
  • It does not index very many file types (just jpeg, png, gif, txt, pdf, doc, rtf, html, xls, ppt and mp3 files)

Is there any other software that I could use to index the docx, pptx, xsl, xslx, chm etc files on the thumb drive?
Thanks,
Ari

BuddhaChu
BuddhaChu's picture
Offline
Last seen: 7 years 5 months ago
Joined: 2006-11-18 10:26
I suggest you ask the Puggle

I suggest you ask the Puggle developers to add support for those file extensions.

BTW, here's the list of extension it will index per it's source code (FileHandler.java) It might be as simple as adding a line for .docx and letting their DOCHandler do it's work.

    public FileHandler(boolean storeText, boolean storeThumb) {
        this.STORE_TEXT = storeText;
        this.STORE_THUMBNAIL = storeThumb;
        
        this.handlerProps = new Properties();
        this.handlerProps.setProperty("txt", "puggle.LexicalAnalyzer.TextHandler");
        this.handlerProps.setProperty("pdf", "puggle.LexicalAnalyzer.PDFHandler");
        this.handlerProps.setProperty("doc", "puggle.LexicalAnalyzer.DOCHandler");
        this.handlerProps.setProperty("rtf", "puggle.LexicalAnalyzer.RTFHandler");
        this.handlerProps.setProperty("wpd", "puggle.LexicalAnalyzer.WordPerfectHandler");
        this.handlerProps.setProperty("html", "puggle.LexicalAnalyzer.HTMLHandler");
        this.handlerProps.setProperty("htm", "puggle.LexicalAnalyzer.HTMLHandler");
        this.handlerProps.setProperty("xls", "puggle.LexicalAnalyzer.XLSHandler");
        this.handlerProps.setProperty("ppt", "puggle.LexicalAnalyzer.PPTHandler");
        
        this.handlerProps.setProperty("mp3", "puggle.LexicalAnalyzer.MP3Handler");
        this.handlerProps.setProperty("jpg", "puggle.LexicalAnalyzer.ImageHandler");
        this.handlerProps.setProperty("jpeg", "puggle.LexicalAnalyzer.ImageHandler");
        this.handlerProps.setProperty("gif", "puggle.LexicalAnalyzer.ImageHandler");
        this.handlerProps.setProperty("png", "puggle.LexicalAnalyzer.ImageHandler");

        this.handlerProps.setProperty("zip", "puggle.LexicalAnalyzer.ZipHandler");
        this.handlerProps.setProperty("rar", "puggle.LexicalAnalyzer.RarHandler");

        this.handlerProps.setProperty("exe", "puggle.LexicalAnalyzer.AppHandler");
        this.handlerProps.setProperty("com", "puggle.LexicalAnalyzer.AppHandler");
        this.handlerProps.setProperty("cab", "puggle.LexicalAnalyzer.AppHandler");
        this.handlerProps.setProperty("msi", "puggle.LexicalAnalyzer.AppHandler");
    }

Cancer Survivors -- Remember the fight, celebrate the victory!
Help control the rugrat population -- have yourself spayed or neutered!

gudmund
Offline
Last seen: 12 years 11 months ago
Joined: 2007-11-14 05:55
That's not all of the problems...

Puggle also selects for you *which* drive to index. In my case, it selected a portable partition I use for storing certain data and temporary backups.

No chance of even telling it what drive it should go about indexing, much less which directory.

Since indexing would probably have taken uncountable hours (well over a hundred GB...), I just killed it, and will not use it again until it's fixed.

Gudmund

adoucette
Offline
Last seen: 3 years 7 months ago
Joined: 2009-12-24 13:49
Are you referring to a

Are you referring to a problem with Puggle or with Wilma?
AFAIK Wilma does not have this problem and has options for which directories to index.

gudmund
Offline
Last seen: 12 years 11 months ago
Joined: 2007-11-14 05:55
Puggle, just as I stated

Wilma handles that bit very nicely, IMO easier and more efficiently than e. g. Copernic.

If I can make it handle ÅÄÖ etc. properly, I will probably use it.

Apologies if I come across as cranky, the mishandling of those characters in an age that has *unicode* is just slowly getting too much for my nerves (Google doesn't handle it correctly, Adobe Reader doesn't handle them correctly, Copernic doesn't handle them correctly, etc. etc. etc., wasting large amounts of time for workarounds or weeding through thousands of useless hits...)

Gudmund

dodi
Offline
Last seen: 2 years 1 month ago
Joined: 2010-10-15 09:26
Did you find a solution to handle ÄÖ etc.?

I am also looking for a portable index and search solution and I also need unicode characters supported.
Any update on this topic?

dodi
Offline
Last seen: 2 years 1 month ago
Joined: 2010-10-15 09:26
PocketSearch supports ä, ö, ü

I tested PocketSearch and it supports ä, ö, ü, ß. But it does not differenciate between ä and a, ö and o, ü and u, ß and s.
For more info on PocketSearch see https://portableapps.com/node/29630.

Log in or register to post comments