Jump to content
IGNORED

Web99 - Search your TiFiles/Disks v0.5 Beta 4


kl99

Recommended Posts

new verison released: see post #1 for download :)

 

2015-05-24 Version 0.1C
- by accident the last version was published with DISKSIZE set hardcoded to 1440 (for testing purposes) and excluded SECTORSFREE, SECTORSUSED. fixed that.
- NumericFields were changed by Lucene to StringFields when you clicked on Find Duplicates (which updated all documents). This behavior of Lucene was badly documented and is no longer an issue.
- The QueryParser now generates a NumericRangeQuery when you use it on a numeric Field. Simply use it like +SECTORS:[3 TO 12] or when filtering disks you can try +DISKSIZE[360 TO 720]
- The Field DISKNAME and PATH were not case insensitive yet. You can search for Path now: +PATH:*WHTECH*

Edited by kl99
  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

2015-06-04 Version 0.2
- during Indexing each TiFile of Type "PROGRAM" is checked whether it's actually a Basic or XB Program. The Basic Sourcecode is then saved into a field called SOURCE. The Type is changed from PROGRAM to BASIC. You can already perform a full text search within the SOURCE field and see matching documents in the result.
Example: +documenttype:tifile +source:apesoft
Nothing more is yet visualized (you don't see the source code yet) because this new feature opens up some questions.

? How to deal, if the Basic SourceCode is identical but the TiFile is not identical on binary level.
? How should embedded assembler code in a PROGRAM file be handled if there is any. (I don't know yet where to search for that in the binary).
? The algorithm for when a space shall be added or not added before/after a Token is not yet clear and seems to be solved different across all tools. If I create a Hash based on the Basic Sourcecode, it's essential that this is clear/correct before doing that.
? There is no output information yet why a certain Program failed to be identified as a Basic Program.
? The token size of the source field is limited to 256 bytes. This is suboptimal.

You can visualize the source field or any other field with the external Tool Luke by pointing it to the Index Folder:
https://code.google.com/p/luke/downloads/detail?name=luke-1.0.1.jar&can=2&q=

  • Like 1
Link to comment
Share on other sites

  • 5 months later...

This is version 0.4 of Web99:

 

I want to give you a quick update on this project.
There is heavy works going on on this for a while now. Further I did a presentation on the TI Treff and on the Chicago TI Faire about Web99.

Here is a presentation about Web99 from the Chicago TI Faire 2015:
http://www.ustream.tv/recorded/76712901?rmalang=de_DE

 

What was worked on?

 

- nice treeview of what is Inside the Index, all Types (DIS/FIX, PROGRAM,...) get their own icon

- TI Basic/Extended Basic Syntax Highlighting

- created a custom Lexer for within the Scintilla Library

- added a custom Scintilla.Net Library based on that
- this will be extended to Syntax Highlighting for all other kinds of Source Code like Assembler, GPL,...
- a public independent release of this will come for other Tools to integrate, just need to get some bugs fixed there

- fulltext search within TI Basic/XB Source Files

- using Subversion for Version Controlling the Source Code of the Project for a while already

- created classes for TI Entities like Ti File Binary, Ti File, Disk File Record, Disk and Disk Clone and seperated them from methods which are extracting data for those

- found a big bug in the code which let clustered Ti Files to not have a correct/usable binary, fixed it of course. Many more Basic Program have been extracted as a result.

- started to write some first Unit Tests to test the Code of Web99

- added a lot of checks in code to not let a corrupt Disk Image crash the Application

- small fixes to Basic Source Code formatting, the reference for the look is always how it looks when you LIST the same program in your TI. Both TI99Dir and TIImageTool have some issues there as well.

- Basic Source Code Extraction now properly stores and displays characters beyond Ascii Code 127 (using Codepage 850). Thanks to John G. for pointing out the need.

- Fixes in calculation of FileSize and ClusterLength (many Files have incorrect values in their File Descriptor Record) which have let to corrupt Binaries.

Have fun:

Web99 version 0.4

2015-11-09

Web99_2015-11-09_v0.4.zip

  • Like 3
Link to comment
Share on other sites

An interesting idea - I gave it a try but it didn't work well for me.

 

I was able to set up index, and it scanned (it would have been nice to have reported how many files or found or something!). But when I clicked 'search', I got an unhandled exception.

 

post-12959-0-36936500-1447070959_thumb.jpg

 

It's a permissions error of some kind, I looked and the file in question did exist at that path, but, I have a lot of my stuff locked down - running executables from temp is frowned upon on my system. Do you have any control over that, can you just put the DLLs in the app's folder?

 

I attempted to run as admin, and it was missing MSVCR120D.dll -- that's the debug version of the DLL, so some component seems to be compiled as a debug version. This was again during the load of the scilexer.

 

I like the idea a lot, though.... if I can throw in a wishlist with the above bug report -- the ability to have multiple index roots would be nice but is not critical (as I have several folders of TI disks. I could consolidate them all now, I guess, but I have gotten used to it). And a way to just browse the list? (Admittedly, maybe you can do that already with search, I need to get farther in.)

 

Does it only do disk images, or can it do V9T9 files and TIFILES files too? :)

 

Link to comment
Share on other sites

i will check that issue asap (today).

it only does v9t9 diskimages so far. but i pretty much encapsulated the lowest level nicely, so pc99 diskimage support should come soon.
individual files are planned as well of course.
browse the list by simply clicking on the top buttons "TiFile" or "DiskFileRecord". Those will show 5000 entries by default. You can change the maximum number of Result in the Search View.

 

Can you give me some steps on how to exactly reproduce the Exception?

Link to comment
Share on other sites

you can add multiple root folders to the index. once you have scanned a certain folder, the original data is no longer necessary. it's within the index then.
simply go:
"Select Disk Root Folder", select Folder Root 1, "Start Indexing" -->

"Select Disk Root Folder", select Folder Root 2, "Start Indexing" -->

...

"Find Duplicates"

All should then be in the index and any duplicates identified.

  • Like 1
Link to comment
Share on other sites

This is a temporary fix. I will have to recompile the library as non-debug to fix this.
For now I replaced the library with the default one from the internet, and changed some of my code.
This should work now for everybody, but don't expect the Syntax Hightlighting to be as nice as it can be. :)

 

Web99 v0.4a

Web99_2015-11-09_v0.4a.zip

  • Like 1
Link to comment
Share on other sites

Can you give me some steps on how to exactly reproduce the Exception?

 

I did, I gave the exact steps I followed. I started the application. I set an index. I scanned it, then closed the dialog back to the main menu and I clicked 'search'. :)

 

Something is extracting that DLL in the user's temp folder and then attempting to load it, and that's a restricted action on newer versions of Windows, since it's more often used by malware than valid tools. I wasn't sure if that was something you had control over or something that the library you are using is doing in the background.

Link to comment
Share on other sites

In non-administrator mode, still get the exception attempting to load the SciLexer.dll out of the user temporary path.

 

Reproduction steps: start Web99, click 'setup index'. Select a Disk Root Folder. Add to Index. Find duplicates. Close window. Click 'Search'. (Not running as administrator).

 

post-12959-0-56980300-1447153889_thumb.jpg

 

When I closed the app and launched a second time, I went immediately to the Search button and got the debug version warning:

 

post-12959-0-98060200-1447154004_thumb.jpg

 

Again, this appears to be because of the SciLexer.dll. Note the path - I'm also running 64-bit. Dependency Walker confirms that it is SciLexer.dll that is looking for the debug MSVC runtime.

 

The above with 0.4b from your last post.

 

Full contents of the exception window:

 

 

See the end of this message for details on invoking
just-in-time (JIT) debugging instead of this dialog box.
************** Exception Text **************
System.ComponentModel.Win32Exception (0x80004005): Could not load the Scintilla module at the path 'C:\Users\tursi\AppData\Local\Temp\ScintillaNET\3.5.1\x64\SciLexer.dll'. ---> System.ComponentModel.Win32Exception (0x80004005): The specified module could not be found
   at ScintillaNET.Scintilla.get_CreateParams() in c:\Users\admin\Documents\Visual Studio 2013\Projects\ScintillaNET-3.5.1\src\ScintillaNET\Scintilla.cs:line 3435
   at System.Windows.Forms.Control..ctor(Boolean autoInstallSyncContext)
   at ScintillaNET.Scintilla..ctor() in c:\Users\admin\Documents\Visual Studio 2013\Projects\ScintillaNET-3.5.1\src\ScintillaNET\Scintilla.cs:line 5707
   at Web99.UI.Forms.SearchForm.InitializeComponent() in c:\Users\admin\Documents\Visual Studio 2013\Projects\Web99\Web99\UI\Forms\SearchForm.Designer.cs:line 41
   at Web99.UI.Forms.SearchForm..ctor(Web99Session session) in c:\Users\admin\Documents\Visual Studio 2013\Projects\Web99\Web99\UI\Forms\SearchForm.cs:line 28
   at Web99.UI.Forms.MainForm.SearchButton_Click(Object sender, EventArgs e) in c:\Users\admin\Documents\Visual Studio 2013\Projects\Web99\Web99\UI\Forms\MainForm.cs:line 82
   at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ButtonBase.WndProc(Message& m)
   at System.Windows.Forms.Button.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)

************** Loaded Assemblies **************
mscorlib
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.NET/Framework64/v4.0.30319/mscorlib.dll
----------------------------------------
Web99
    Assembly Version: 1.0.0.0
    Win32 Version: 1.0.0.0
    CodeBase: file:///C:/tools/web99/Web99.exe
----------------------------------------
System.Windows.Forms
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34251 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Windows.Forms/v4.0_4.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
----------------------------------------
System.Drawing
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34270 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Drawing/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
----------------------------------------
System
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34238 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System/v4.0_4.0.0.0__b77a5c561934e089/System.dll
----------------------------------------
Lucene.Net
    Assembly Version: 3.0.3.0
    Win32 Version: 3.0.3.0
    CodeBase: file:///C:/tools/web99/Lucene.Net.DLL
----------------------------------------
System.Configuration
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Configuration/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll
----------------------------------------
System.Xml
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34234 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Xml/v4.0_4.0.0.0__b77a5c561934e089/System.Xml.dll
----------------------------------------
System.Core
    Assembly Version: 4.0.0.0
    Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR
    CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Core/v4.0_4.0.0.0__b77a5c561934e089/System.Core.dll
----------------------------------------
ScintillaNET
    Assembly Version: 3.5.1.0
    Win32 Version: 3.5.1.0
    CodeBase: file:///C:/tools/web99/ScintillaNET.DLL
----------------------------------------
************** JIT Debugging **************
To enable just-in-time (JIT) debugging, the .config file for this
application or computer (machine.config) must have the
jitDebugging value set in the system.windows.forms section.
The application must also be compiled with debugging
enabled.
For example:
<configuration>
    <system.windows.forms jitDebugging="true" />
</configuration>
When JIT debugging is enabled, any unhandled exception
will be sent to the JIT debugger registered on the computer
rather than be handled by this dialog box.

 

I also tried 0.4a - this runs successfully. (Based on this observation it may be ONLY the debug DLL that is the issue, and not permissions).

 

 

Link to comment
Share on other sites

Playing with 0.4a - pretty nice! My one observation is that DV80 files all display on one line in the preview window - would be nice to have those work for easy viewing!

 

So when I'm browsing in TiFile mode - once I find a file I'm interested in, is there a way to see the location of the disk image it's from so I can find it on the PC?

Link to comment
Share on other sites

I hope it's fixed now Tursi. Thanks for the good bug report. I recompiled the Scintilla.dll as release now instead of only recompiling ScintillaNET.

 

Web99 v0.4c

Web99 2015-11-10 v0.4c.zip

 

Please tell me if it's fixed.

 

I know about the ugly display of DIS/VAR Files, I simply didn't have time yet for that.

Will also come up with something to display the DiskPath. The info for that is already within the index .

 

Until the rs232 transfer towards the TI-99 runs nice I could think of some nice file write of the selected binary into a chosen target path (think of "D:\Classic99\DSK1").

The goal is to get rid of the dependency to such things as navigating through your Disk Libraries and manually copy over files with TI99DIR or other tools.

 

Update [2015-11-24]: An Export of a wanted TiFile to the Classic99 Disk Path is now possible with Web99 v0.5 Beta 2.

I have a ton of entries on my web99 todo-list. :)

Edited by kl99
Link to comment
Share on other sites

Web99 v0.4d [2015-11-12]

Web99 2015-11-12 v0.4d.zip

 

The isse is gone with this release on the machine where it happened before. If you have used Web99 v0.4a please ensure that you remove this folder once:

C:\Users\[username]\AppData\Local\Temp\ScintillaNET

 

For some reasons this library never replaces itsself even if it has changed.

 

Have fun :)

Link to comment
Share on other sites

Try things like adding :

 

+filename:EDIT*

 

--> this will filter to a list only matching to your filter. so in this case you only get files that start with EDIT or Edit or edit. The star is a wildcar for nothing or any number of characters afterwards.

 

+filename:*DIT*

--> same as before but it shows you you can even search with wildstars in the beginning

 

+type:"dis/var 80"

--> file to only match Files of Type: DIS/VAR 80

 

+type:basic

--> only BASIC Files.

 

+diskname:xb*

 

--> only return Disks or DiskFileRecords that have a DISKNAME starting with "XB..."

 

+source:*say*

 

--> returns any Basic Program that has the string "SAY" in it, mostly this will result in programs using SPEECH.

 

+source:*Gilliland*

 

--> again searches for the provided string in all your indexed Basic Programs.

 

The whole search should be case insensitive.

  • Like 1
Link to comment
Share on other sites

There we go, that's working here now! :)

 

I'm not terribly fond of having executable files stored permanently in my temp folder, though... allowing that is opening up a lot. Is there any way you can just ship SciLexer.dll with the application rather than have it extracted? If not, can you maybe tag the file for deletion on exit so at least it doesn't hang around and only exists during runtime? :)

Link to comment
Share on other sites

I'm not terribly fond of having executable files stored permanently in my temp folder, though... allowing that is opening up a lot. Is there any way you can just ship SciLexer.dll with the application rather than have it extracted? If not, can you maybe tag the file for deletion on exit so at least it doesn't hang around and only exists during runtime? :)

 

Thanks for the feedback. This comes out of the box like that from this 3rd party library. The .net Scintilla Library is wrapping around the Scintilla.dll which is a C++ code library. The only thing I changed was adding support for TiBasic/XB Syntax Highlighting. But I get your point and will try to customize the library to have it in the application folder.

 

Update [2015-11-24]:

The behavior got changed with Web99 v0.5 Beta. It now creates the necessary dll inside the Web99 Folder. The Temp Path is no longer used.

Edited by kl99
Link to comment
Share on other sites

A short update after another week of development. I am trying to improve the UI, so you don't have to learn the Syntax Query in order to find your stuff.
It's not yet ready for release and I wanna share some thoughts on upcoming changes...

Basically there are 3 big learnings for me when dealing with comparing Ti Files and Disks.
Every single one will mean big changes to the way the comparison will be done by the tool.
Each one will most probably identify a lot more duplicates in your collection, resulting in a much cleaner picture.

1. Basic Programs:
If you want to find out whether two Basic Programs are identical, you shouldn't do a binary comparison of their PROGRAM Files.

We are dealing with Memory Images, saved as PROGRAM Files, the binary data depends on what has been in VDP Ram and Ram when doing the SAVE Operation. And later totally depends on how the program was written/edited. Basic Programs are not stored in logical order. There is a Line Number Table which points to memory areas containing the content for that one line. That could be spread anywhere within a large area of memory. The computer adds the last edited line on top of its VDP Ram, and there is no resorting of any data during SAVE. As a result two people typing in the same program from a magazine (without any errors) will most probably result in two different binary PROGRAM Files representing the same Basic Source Code.

In other words two PROGRAM Files can be different on binary level, but still contain the same Basic Program in them. They should therefore be identified as duplicates to each other.

Rather you need to extract the Basic Source Code from those PROGRAM Files and compare those in order to find out whether they are identical or not. Such a Source Code comparison could even blend out different Line Numbers (10s instead of 100s,...) and different Variable Names (N$ instead of NAME$) in your comparison. Further the embedded assembler code that might be contained within the PROGRAM File should be safed as such to not be lost.

When calculating the Hash ID for Basic Files, we make it based on the Basic Source Code.

2. Data Files:
If you want to find out whether two Data Files are identical, get rid of the garbage data inbetween the records.

Most Data Files (DIS/VAR, DIS/FIX, INT/FIX and INT/VAR) have such record lengths that there is some unused memory at the end of each Disk Sector. A DIS/VAR 80 only contains 240 Bytes of real data in each Disk Sector. The remaining 16 Bytes are basically wasted and will contain random data or even pieces of data from former Files that have been stored at this Disk Sector. So we are dealing with irrelevant Data inbetween Records. If you compare two Data Files, these areas don't have to be identical in order for the actual Data Files to be identical. Only the actual Records are of interest for comparison.

When calculating the Hash ID for Data Files, we have to exclude all data in the allocated sectors which is not part of any record.

3. Disks:
For two Disks to be identical, the binary data does not have to be identical.

Since we already found two situations, where we don't have the necessity for Files to be binary identical. We can't enforce the Disks to be binary identical, otherwise we don't detect all the duplicates.

When calculating the Hash ID for a Disk, we can't take the whole binary stream into account.

  • Like 2
Link to comment
Share on other sites

Teasing you on the next revision of Web99

post-27826-0-59645900-1447832509_thumb.png

The Search will get several Tabs to choose from.

 

Content will show the File Content (Basic Program Source Code, Text Files,...).

 

Meta Data will show you all the properties.

 

Relations will show you all relations to and from the selected TiFile. We go all the way back to the Binary, which will be the root here. The next level shows you all TiFiles you have that "represent" that binary with a certain FileName. The next level shows you all Disks containing that TiFile with that specific FileName. And the last level shows the DiskPaths from your Pc where you imported the Disk from.

 

Relations is a direct result of Tursi's request to be able to see the actual diskpath where a TiFile of interest is originated from.

 

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...