Jump to content
IGNORED

Parsing a text file


Larry

Recommended Posts

I'm having trouble with a text file that has unusual spacing:

 

T I T L E 1  T I T L E 2  T I T L E 3  (and so on)

 

I want to get the titles in regular text format:

 

TITLE 1

TITLE 2

TITLE 3

 

I haven't been able to come up with a BASIC subroutine to properly remove the extra spaces.  Perhaps you can?  Note that there are two spaces between titles, and just one space between letters.

 

Thanks for any ideas.  Needn't be elegant; just needs to work.  

Link to comment
Share on other sites

Quick thoughts on this - read two bytes at a time from the file.  Stop and check to see if both bytes equal a space character.  If they do, "discard" these 2 bytes (ignore them).  Read two more bytes and again compare.  If they are both spaces, discard, if they are not, write both bytes to either a temp buffer, or the new file you are creating, whatever.  Repeat until EOF.

  • Thanks 1
Link to comment
Share on other sites

I think you'd be best off doing multiple passes on each text line.

If you detect a single space but not two then remove it and set a flag.

Process each line repeatedly until you don't have flagging.

To make it easier you might want to add an extra character such as full-stop to the end of the line then remove it after processing.

 

Single space test would be something like:

IF A$(N,N) = " " AND A$(N,N) <> "  " THEN ...

(second comparison has 2 spaces)

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Here, try this :), obviously the machine code bit won't print, so the file SPACES.BAS is attached

 

10 DIM ML2$(108),A$(50),B$(10)
15 ML2$="hðTÉÐQhÌhËhhhÏhÎhh¨±Îgø©ÔÕ ¢±ËÝgÐàÐèìðÈÌð¸PãÈÌð¸PØ`ªhhÊÐû`¬ÈÔ`Ô`"
20 A$="T I T L E 1  T I T L E 2  T I T L E 3"
30 X=USR(ADR(ML2$),ADR(A$),LEN(A$),ADR(" "),1)
40 IF X THEN A$(X)=A$(X+1):A$=A$(1,LEN(A$))
50 IF X THEN ? A$:GOTO 30
55 IDX=1
60 X=USR(ADR(ML2$),ADR(A$),LEN(A$),ADR("1"),1)
70 IF X THEN B$=A$(IDX,X):IDX=X+1
80 ? B$
90 X=USR(ADR(ML2$),ADR(A$),LEN(A$),ADR("2"),1)
100 IF X THEN B$=A$(IDX,X):IDX=X+1
110 ? B$
120 X=USR(ADR(ML2$),ADR(A$),LEN(A$),ADR("3"),1)
130 IF X THEN B$=A$(IDX,X):IDX=X+1
140 ? B$

 

Screen Shot of output

image.thumb.png.3d89b0e9a8d3fd4776286f13f819d61c.png

 

Should have added, the USR call can find any sub-string in a larger string, it returns the position of

the first character of the sub-string in the main string, returns 0 if not found.

 

SPACES.BAS

Edited by TGB1718
  • Thanks 1
Link to comment
Share on other sites

Thanks to all for the ideas!  I'll try a couple of these things today and tomorrow.  I'm a bit under the weather.  So if you don't hear back from me promptly, it's not because I've lost interest.  My text file is about 10,000 characters in length, and it actually is a file of my DVD collection from originally (I think) a WORD 97 file.  I'll read that into a big string and try SPACES.BAS.  @TGB1718 -- that looks very promising!  I've usually been able to figure out parsing issues like this one in the past, but I've haven't done much programming in the past several years.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...