My Basic Parsing and Transformation Tool

dmsc · May 17, 2015

Hi All,

I finally cleaned up the sources of my TurboBasic XL parsing and transformation program. Also, the program now has many new capabilities:

- It can read free-form basic programs, without line numbers, or standard ATASCII listings, including abbreviations.

- It can write long, ASCII listing output (suitable for editing on a PC), compressed listings keeping each line < 120 bytes or tokenized binary basic files, loadable in TurboBasic XL.

The source is available on GitHub, you can clone from https://github.com/dmsc/tbxl-parser

I am attaching current version, compiled for linux 64bit, linux 32bit and windows, please, test it with any BASIC or TurboBasic XL program you can so I can fix any remaining bug in the parser or output routines.

basicParser-20150517.linux32.zip

basicParser-20150517.linux64.zip

basicParser-20150517.win32.zip

phaeron · May 17, 2015

Ran the tool over the Altirra BASIC test suite, and got a few errors:

CLOAD and CSAVE statements aren't recognized.

Parse error: X=-A$<B$

Parse error: X=A$<B$+C$<D$

Fractional line numbers don't seem to be handled correctly:

9.4 PRINT "9.4"
10.5 PRINT "10.5"
11.6 PRINT "11.6"

The tool converts these statements to dashed lines. The line numbers should instead be rounded.

dmsc · May 17, 2015

Hi!

Thanks for the testing. I took a look to Altirra BASIC test suite.

Ran the tool over the Altirra BASIC test suite, and got a few errors:

CLOAD and CSAVE statements aren't recognized.

Oh, I wrongly specified those as requiring a string expression. Fixed.

Parse error: X=-A$<B$

Parse error: X=A$<B$+C$<D$

So, I assumed that the precedence of string comparisons was the same as numeric comparisons. I fixed those by putting string comparisons at the same precedence that parenthesized expressions.

Fractional line numbers don't seem to be handled correctly:
9.4 PRINT "9.4"
10.5 PRINT "10.5"
11.6 PRINT "11.6"
The tool converts these statements to dashed lines. The line numbers should instead be rounded.

This was because I was parsing the line number as an integer, so it became '9 REM 4 PRINT "9.4"'

I also realized that this is also valid in Atari Basic or TurboBasic:

-.0?"This is line 0!"
.7?"This is line 1!"
1.3E+2?"This is line 130!"

I added parsing the line number as floating point and checking the range later to fix this.

This also showed a difference in my parsing with respect to Atari one, I parse floating point numbers like "1." as valid, but the dot is not included in the number in the original AFP routine. This makes this code valid for my parser:

10 ? "The number ";10.

I like numbers ending in dots, so I did not change that for general numbers, only stop on the dot in line numbers (as this is needed to parse some comment lines).

Pushed the new version to github, attached are the binaries.

basicParser-20150517-linux32.zip

basicParser-20150517-linux64.zip

basicParser-20150517-win32.zip

Irgendwer · September 5, 2015

Pushed the new version to github, attached are the binaries.

Thank you very much. There is a lot of potential for cross development with your tool.

Some wishes:

Having C style comments like '//' and '/* */' would not only increase the readability (for me and maybe others) but also would ease the adaptation for syntax-highlighting schemes in several editors and help disabling code blocks.
Handling of special characters could be improved.
Four ideas:

1. Transformation of international characters to ATASCII ones if available (like ä,Ü,É.ô etc.).

2. Characters (case insensitive) in curly brackets are transformed to their control counterpart {T} results in the ball symbol. {,S;} is 'Heart', 'Cross' and 'Spade'.

3. Leading swung dash switches to inverse video: "~Hello~ World" prints 'Hello' in inverse video. Works also for ~{H~} = inverse ramp.

4. I thought about names for the graphic symbols to be more readable. A name in the brackets {'Ball'} or {'LLine''BLine'} (case insensitive) maybe a solution for that. The suggestion is attached below. Should also work with inverse video
Include of binary data as string literal or data. This would be very helpful! Something like "{{PATHANDFILE}}" could be expanded to a string with the binary data or {{PATHANDFILE}} to a chain of comma separated values. Options like bytes to skip or count could improve this function further (see http://www.cc65.org/doc/ca65-11.html#ss11.61 )

I currently experiment a little bit with an automation tool, which allows to automate the chain from the editor until running in the program in the emulator with just one click. Looks quite nice!

Edit: Final note: An addtional mode in form of a tokenized binary file but with preservation of the variable names for debug purposes would be very welcome!

Edited September 5, 2015 by Irgendwer

dmsc · September 6, 2015

Thank you very much. There is a lot of potential for cross development with your tool.

Some wishes:

Having C style comments like '//' and '/* */' would not only increase the readability (for me and maybe others) but also would ease the adaptation for syntax-highlighting schemes in several editors and help disabling code blocks.

I currently use VIM for editing, and it's BASIC syntax already recognizes ' as comments, this is supported also in QB, QBASIC and other basic dialects. Adding '//' will be possible, but '/*' comments are a lot more difficult.

Handling of special characters could be improved.
Four ideas:

1. Transformation of international characters to ATASCII ones if available (like ä,Ü,É.ô etc.).

2. Characters (case insensitive) in curly brackets are transformed to their control counterpart {T} results in the ball symbol. {,S;} is 'Heart', 'Cross' and 'Spade'.

3. Leading swung dash switches to inverse video: "~Hello~ World" prints 'Hello' in inverse video. Works also for ~{H~} = inverse ramp.

4. I thought about names for the graphic symbols to be more readable. A name in the brackets {'Ball'} or {'LLine''BLine'} (case insensitive) maybe a solution for that. The suggestion is attached below. Should also work with inverse video

Include of binary data as string literal or data. This would be very helpful! Something like "{{PATHANDFILE}}" could be expanded to a string with the binary data or {{PATHANDFILE}} to a chain of comma separated values. Options like bytes to skip or count could improve this function further (see http://www.cc65.org/doc/ca65-11.html#ss11.61 )

This would be possible, but the parser would need two modes to keep the compatibility with existing basic sources. Currently the parser reads standard ATASCII files (with 0x9B as EOL) correctly.

If you want to try, you can modify stmt_add_string() at stmt.c, line 146, this is the code that reads a byte data from the parsed file and builds a tokenized string (0x0F + LENGTH + DATA).

For including external binary data, I would preffer a new syntax instead of '"', perhaps reusing the '#' token, or with embedded comments. Also, I think that what would be valuable is specifying the starting value of string or array variables, I don't know if it is possible to alter the VVT pointers so that the data is not cleared on RUN.

I currently experiment a little bit with an automation tool, which allows to automate the chain from the editor until running in the program in the emulator with just one click. Looks quite nice!

Bildschirmfoto vom 2015-09-05 20:46:35.png

Edit: Final note: An addtional mode in form of a tokenized binary file but with preservation of the variable names for debug purposes would be very welcome!

Yes, that would be useful. I added a '-f' option for full variable names and a '-x' option for *no* variable names. New version is here https://github.com/dmsc/tbxl-parser/

Tell me if you need a new binary (but specify your OS, as I use Linux).

dmsc · September 6, 2015

And one last thing, for better tracking requests, you could add issues to the tracker at github: https://github.com/dmsc/tbxl-parser/issues

+MrFish · September 6, 2015

Nice work! I didn't see your initial release, but I'm very interested since I do all of my TBXL development using an editor on the PC.

I did a test run on one of my larger projects and got the following errors. TBXL eats these lines perfectly fine:

error: (FileName).atb(800): expected end of line, got '(CodePoint) < MinimumDescent THEN MinimumDescent = aBaselineOffset (CodePoint)'
20320 IF aBaselineOffset (CodePoint) < MinimumDescent THEN MinimumDescent = aBaselineOffset (CodePoint)

error: (FileName).atb(802): expected end of line, got '(CodePoint)'
20350 GlyphAscent = GlyphHeight + aBaselineOffset (CodePoint)

error: (FileName).atb(966): expected statement, got 'aBaselineOffset (CodePoint) = VAL (BDFLineBuffer$ (ParamStart, ParamEnd))'
21970 aBaselineOffset (CodePoint) = VAL (BDFLineBuffer$ (ParamStart, ParamEnd))

error: (FileName).atb(991): expected statement, got 'aBaselineOffset (CodePoint) = %0'
22090 aBaselineOffset (CodePoint) = %0

error: (FileName).atb(1085): expected end of line, got '- (PEEK (pGlyphHeight + CodePoint) + aBaselineOffset (CodePoint))'
22720 POKE pStartRenderOffset + CodePoint, MaximumAscent - (PEEK (pGlyphHeight + CodePoint) + aBaselineOffset (CodePoint))

error: (FileName).atb(1604): expected end of line, got '""'
25060 BDFLineBuffer$ (LEN (BDFLineBuffer$) + %1) = "{ } [ ] < > ( ) / \ : ; . , ` ' """

error: (FileName).atb(2039): expected end of line, got '""; FileName$; """"; " Not Opened" '
29656 PRINT """"; FileName$; """"; " Not Opened" : PRINT

error: (FileName).atb(2206): expected statement, got 'aBaselineOffset (Index) = %0 '
30560 FOR Index = %0 TO c255 : aBaselineOffset (Index) = %0 : NEXT Index

error: (FileName).atb(2383): expected statement, got 'DIM aBaselineOffset (c255)'
31644 DIM aBaselineOffset (c255)

error: (FileName).atb(2390): expected statement, got 'DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)'
31650 DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)

error: (FileName).atb(2394): expected statement, got 'DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)'
31660 DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)

error: (FileName).atb(2397): expected statement, got 'DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)'
31670 DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)

error: (FileName).atb(2400): expected statement, got 'DIM Number$ (c6), NumberComma$ (c7)'
31680 DIM Number$ (c6), NumberComma$ (c7)

error: (FileName).atb(2403): expected statement, got 'DIM FontName$ (64)'
31700 DIM FontName$ (64)

error: (FileName).atb(2406): expected statement, got 'DIM PropertyName$ (56)'
31710 DIM PropertyName$ (56)

error: (FileName).atb(2410): expected statement, got 'DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)'
31730 DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)

error: (FileName).atb(2411): expected statement, got 'DIM GlyphWidth$ (c256), GlyphHeight$ (c256)'
31740 DIM GlyphWidth$ (c256), GlyphHeight$ (c256)

error: (FileName).atb(2414): expected statement, got 'DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)'
31750 DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)

dmsc · September 6, 2015

Hi!,

Nice work! I didn't see your initial release, but I'm very interested since I do all of my TBXL development using an editor on the PC.

I did a test run on one of my larger projects and got the following errors. TBXL eats these lines perfectly fine:

Thanks for trying it.

error: (FileName).atb(800): expected end of line, got '(CodePoint) < MinimumDescent THEN MinimumDescent = aBaselineOffset (CodePoint)'

20320 IF aBaselineOffset (CodePoint) < MinimumDescent THEN MinimumDescent = aBaselineOffset (CodePoint)

So, the problem is that my parser don't like spaces between the array names and the left parenthesis.

But, I tried TBXL and it does not work either, does not gives an error but the parser is severely confused:

READY
20320 IF ABASELINEOFFSET (CODEPOINT) < MINIMUMDESCENT THEN MINIMUMDESCENT = ABASELINEOFFSET (CODEPOINT)
LIST

20320 IF ABASELINEOFFSETCODEPOINT)<MINIMUMDESCENT THEN MINIMUMDESCENT=ABASELINEOFFSETCODEPOINT)

READY
DUMP

ABASELINEOFFSET =0
CODEPOINT =0
MINIMUMDESCENT =0

READY

As you see, the listed line is different to the input one, misses the array parenthesis, and the variables where defined as scalars instead of arrays. I can add support for this to my parser, but I don't know if it will break other parts.

Also, Altirra Basic gives:

Altirra 8K BASIC 1.40

Ready
10 DIM A(10), A (10)
Error-   10 DIM A(10),  (10)
LIST

Ready

error: (FileName).atb(1604): expected end of line, got '""'

25060 BDFLineBuffer$ (LEN (BDFLineBuffer$) + %1) = "{ } [ ] < > ( ) / \ : ; . , ` ' """

error: (FileName).atb(2039): expected end of line, got '""; FileName$; """"; " Not Opened" '

29656 PRINT """"; FileName$; """"; " Not Opened" : PRINT

This is a different error. This was supposed to work in older versions of the parser, but somehow does not work now

It is fixed in last version at https://github.com/dmsc/tbxl-parser/

error: (FileName).atb(2390): expected statement, got 'DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)'

31650 DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)

error: (FileName).atb(2394): expected statement, got 'DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)'

31660 DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)

error: (FileName).atb(2397): expected statement, got 'DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)'

31670 DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)

error: (FileName).atb(2400): expected statement, got 'DIM Number$ (c6), NumberComma$ (c7)'

31680 DIM Number$ (c6), NumberComma$ (c7)

error: (FileName).atb(2403): expected statement, got 'DIM FontName$ (64)'

31700 DIM FontName$ (64)

error: (FileName).atb(2406): expected statement, got 'DIM PropertyName$ (56)'

31710 DIM PropertyName$ (56)

error: (FileName).atb(2410): expected statement, got 'DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)'

31730 DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)

error: (FileName).atb(2411): expected statement, got 'DIM GlyphWidth$ (c256), GlyphHeight$ (c256)'

31740 DIM GlyphWidth$ (c256), GlyphHeight$ (c256)

error: (FileName).atb(2414): expected statement, got 'DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)'

31750 DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)

Oh, I see that in DIM statements, I don't support spaces after the "$" in string variables. This is different than the above, and fixing it fixes many of the errors. Fixed in last version.

So, it is true that for completeness, both syntax should be supported, but as it will not be compatible with standard TBXL, Atari Basic or Altirra Basic, I think that I will pass for now.

Thanks again for the bug reports

If you need a precompiled newer version, tell me what OS do you have.

+MrFish · September 6, 2015

So, the problem is that my parser don't like spaces between the array names and the left parenthesis.

But, I tried TBXL and it does not work either, does not gives an error but the parser is severely confused:
READY
20320 IF ABASELINEOFFSET (CODEPOINT) < MINIMUMDESCENT THEN MINIMUMDESCENT = ABASELINEOFFSET (CODEPOINT)
LIST

20320 IF ABASELINEOFFSETCODEPOINT)<MINIMUMDESCENT THEN MINIMUMDESCENT=ABASELINEOFFSETCODEPOINT)

READY
DUMP

ABASELINEOFFSET =0
CODEPOINT =0
MINIMUMDESCENT =0

READY
As you see, the listed line is different to the input one, misses the array parenthesis, and the variables where defined as scalars instead of arrays. I can add support for this to my parser, but I don't know if it will break other parts.

OK, this is bringing back some memories now (haven't touched the code for some months). I do recall seeing the formatting changes that you display above, yet the related code -- and many others like it -- are functioning perfectly.

So, would you say that this is a bug that TBXL accepts the formatting, function as expected, and yet doesn't store it internally as expected?

error: (FileName).atb(1604): expected end of line, got '""'
25060 BDFLineBuffer$ (LEN (BDFLineBuffer$) + %1) = "{ } [ ] < > ( ) / \ : ; . , ` ' """

error: (FileName).atb(2039): expected end of line, got '""; FileName$; """"; " Not Opened" '
29656 PRINT """"; FileName$; """"; " Not Opened" : PRINT

This is a different error. This was supposed to work in older versions of the parser, but somehow does not work now

It is fixed in last version at https://github.com/dmsc/tbxl-parser/

OK, great.

error: (FileName).atb(2390): expected statement, got 'DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)'

31650 DIM FileBDF$ (c15), FilePFT$ (c15), FileSpec$ (c8), FileName$ (c15), FileDirBuffer$ (17)

error: (FileName).atb(2394): expected statement, got 'DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)'

31660 DIM GlyphBytesBuffer$ (360), BDFLineBuffer$ (200)

error: (FileName).atb(2397): expected statement, got 'DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)'

31670 DIM YesNo$ (%1), Word$ (c18), MessageText$ (38)

error: (FileName).atb(2400): expected statement, got 'DIM Number$ (c6), NumberComma$ (c7)'

31680 DIM Number$ (c6), NumberComma$ (c7)

error: (FileName).atb(2403): expected statement, got 'DIM FontName$ (64)'

31700 DIM FontName$ (64)

error: (FileName).atb(2406): expected statement, got 'DIM PropertyName$ (56)'

31710 DIM PropertyName$ (56)

error: (FileName).atb(2410): expected statement, got 'DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)'

31730 DIM BitmapOffsetLSB$ (c256), BitmapOffsetMSB$ (c256), StartRenderOffset$ (c256)

error: (FileName).atb(2411): expected statement, got 'DIM GlyphWidth$ (c256), GlyphHeight$ (c256)'

31740 DIM GlyphWidth$ (c256), GlyphHeight$ (c256)

error: (FileName).atb(2414): expected statement, got 'DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)'

31750 DIM RenderGlyph$ (219), HexToDec$ (50), AccessBanked$ (245)

Oh, I see that in DIM statements, I don't support spaces after the "$" in string variables. This is different than the above, and fixing it fixes many of the errors. Fixed in last version.

So, it is true that for completeness, both syntax should be supported, but as it will not be compatible with standard TBXL, Atari Basic or Altirra Basic, I think that I will pass for now.

I'm not sure I understand you exactly here. You're saying that the spaces after the dollar sign ARE supported in the latest version? Maybe your second statement here is just applying to the first problem with parenthesis listed...

Thanks again for the bug reports

If you need a precompiled newer version, tell me what OS do you have.

No problem and thanks for your efforts and help.

Yes, I need a compiled version for Windows 7 - 64-bit.

dmsc · September 6, 2015

Hi!,

OK, this is bringing back some memories now (haven't touched the code for some months). I do recall seeing the formatting changes that you display above, yet the related code -- and many others like it -- are functioning perfectly.

So, would you say that this is a bug that TBXL accepts the formatting, function as expected, and yet doesn't store it internally as expected?

Somewhat, the code works but I think it will crash at some point, as the VVT is corrupted (it is stored as a numeric variable, but the DIM reserves memory for it).

Also, the variable "A(" and "A (" are treated as two different variables, so it will not work reliable if you don't type them with the space each time.

I'm not sure I understand you exactly here. You're saying that the spaces after the dollar sign ARE supported in the latest version? Maybe your second statement here is just applying to the first problem with parenthesis listed...

Spaces after the dollar sign are supported, because then you don't have the variable name ambiguity. The rule is that in Atari Basic and derived the variable name includes the dollar sign for strings and the open parenthesis for arrays.

Yes, I need a compiled version for Windows 7 - 64-bit.

I'm attaching the current binary for windows 32 and 64bit: basicParser-20150906-r13-win32.zip

and the Linux 64bit build: basicParser-20150906-r13-linux.zip

Edited September 6, 2015 by dmsc

+MrFish · September 6, 2015

Somewhat, the code works but I think it will crash at some point, as the VVT is corrupted (it is stored as a numeric variable, but the DIM reserves memory for it).

Also, the variable "A(" and "A (" are treated as two different variables, so it will not work reliable if you don't type them with the space each time.

I'm using these types of statements and accessing the associated arrays with no problem at all.

If you type or paste this example code into your emulator as a test:

10 DIM A (10): NUMBER = 3

20 FOR I = 0 TO 9

30 A (I) = NUMBER * I

40 NEXT I

50 FOR I = 0 TO 9

60 ? A (I)

70 NEXT I

You'll get this when it's listed:

10 DIM A10):NUMBER=3

20 FOR I=0 TO 9

30 AI)=NUMBER*I

40 NEXT I

50 FOR I=0 TO 9

60 ? AI)

70 NEXT I

And the output will be this:

0

3

6

9

12

15

18

21

24

27

If you do a dump, this is all that's listed:

A =0

NUMBER =3

I =10

Spaces after the dollar sign are supported, because then you don't have the variable name ambiguity. The rule is that in Atari Basic and derived the variable name includes the dollar sign for strings and the open parenthesis for arrays.

Got you -- makes sense.

I'm attaching the current binary for windows 32 and 64bit: basicParser-20150906-r13-win32.zip

and the Linux 64bit build: basicParser-20150906-r13-linux.zip

Thank you. I'll use this version for future tests.

Edited September 6, 2015 by MrFish

phaeron · September 6, 2015

I'm using these types of statements and accessing the associated arrays with no problem at all.

Uh, except that it produces a LISTing that can't be ENTERed back in....

+MrFish · September 6, 2015

Uh, except that it produces a LISTing that can't be ENTERed back in....

You're right about that. Although the way I'm working, I never need to list them back out, since my REMarks have no line numbers and because TBXL turns all lowercase into uppercase, which is used heavily in formatting my code.

Edited September 6, 2015 by MrFish

Irgendwer · September 6, 2015

If you want to try, you can modify stmt_add_string() at stmt.c, line 146, this is the code that reads a byte data from the parsed file and builds a tokenized string (0x0F + LENGTH + DATA).

Thanks for the hint. I'll may have a look into the source later on. I'm not that familiar with gawk and peg so I tried to abstain from inspecting the source - now.

Also, I think that what would be valuable is specifying the starting value of string or array variables, I don't know if it is possible to alter the VVT pointers so that the data is not cleared on RUN.

I'm not sure if I get that correctly. Do you mean the destination address of the data? I just like to binary include...

move adr("\18\18\18\3C\3C\24\24\3C\00\18\18\3C\7E\7E\00\FF\00\7E\7E"), pmadr+345, 19

... the data you can find as argument for the ADR above. (An additional top notch feature could be support for an in-line-assembler. External assembler is called with the contents of the expression, and the resulting binary is included. Would make sense for the popular page 6 routines.)

Which brings me to the point why not also including source code segments/procedures. Would allow to offer some useful TB libraries - if not the problem of 'how to pass arguments' would remain (only rely on documentation?).

I have to stop to think about features - I don't want to scare you..

Thank you for the updated version!

Edited September 6, 2015 by Irgendwer

pirx · September 6, 2015

How come I haven't seen this before! It is so much better than my stupid tokenizer that 10 liner contest will blow away now

Roydea6 · September 6, 2015

I like this parser.

All except that when it changes variable names then the 26th variable is 'Z' and the 27 variable change is '_' and the 28 variable is 'A0'. I have never seen variables start with the under line score.'_' before and it is confusing even if correct.

dmsc · September 6, 2015

Thanks for the hint. I'll may have a look into the source later on. I'm not that familiar with gawk and peg so I tried to abstain from inspecting the source - now.

AWK is used to convert a list of statements to code that parses and prints the statements, and PEG is used to generate the parser C code from the parser definitions. If you want to modify the code, you don't need to modify any of the awk scripts, and modifying the PEG parser is not that difficult.

I'm not sure if I get that correctly. Do you mean the destination address of the data? I just like to binary include...

I was thinking on statements like:

10 A$="Hello World"
20 C23 = 23

Here, after execution, the string is in two places, one in the program code as a constant and a second time inside the A$ variable. The same with the "C23" variable, the initialization uses 10 bytes that are not needed if you don't want to modify the variable.

It is possible to generate a SAVEd file that has the contents of any variable, including strings and arrays, embedded in the file. But when you RUN the program, TBXL clears all the variables. So, if we could modify the interpreter (for example, executing a CONT instead of RUN), thew saved variables would preserve the values and the binary program would be smaller.

... the data you can find as argument for the ADR above. (An additional top notch feature could be support for an in-line-assembler. External assembler is called with the contents of the expression, and the resulting binary is included. Would make sense for the popular page 6 routines.)

Ha, an embedded assembler would be too much I think But including external data, it is not that difficult. Do you think that a syntax like the following woult be ok?:

 move adr(@"mysource.bin"), $600, #"mysource.bin"
'

Which brings me to the point why not also including source code segments/procedures. Would allow to offer some useful TB libraries - if not the problem of 'how to pass arguments' would remain (only rely on documentation?).

I tough about that, but then I think that this would become like a BASIC compiler, and then a compiler seems more useful because the compiled programs would run faster than the interpreted ones.

I have to stop to think about features - I don't want to scare you..

No problem, but I don't promise on implementing them

I like this parser.

All except that when it changes variable names then the 26th variable is 'Z' and the 27 variable change is '_' and the 28 variable is 'A0'. I have never seen variables start with the under line score.'_' before and it is confusing even if correct.

This is an extension by TBXL, there variables can contain the "_" character as another letter. It is important in the tenliners as you use one less character on the name.

Perhaps I could add an option to limit the names to only letters and numbers, but the program already has too many options.... What option name would be that? :)

Irgendwer · September 6, 2015

It is possible to generate a SAVEd file that has the contents of any variable, including strings and arrays, embedded in the file. But when you RUN the program, TBXL clears all the variables. So, if we could modify the interpreter (for example, executing a CONT instead of RUN), thew saved variables would preserve the values and the binary program would be smaller.

Ok, now I see your motivation. It sounds useful, but fiddling with the interpreter to squeeze out some bytes for this approach wouldn't be my priority. But my use case may not be representative...

Ha, an embedded assembler would be too much I think

It wasn't my intention that you have to write an assembler, but calling one (like ATASM) and including the output would be quite nifty. Therefore I thought also about not only including binary data as string but also expand the data to 'DATA'-statements (this could then be read to e.g. page 6).

But including external data, it is not that difficult. Do you think that a syntax like the following woult be ok?:
move adr(@"mysource.bin"), $600, #"mysource.bin"

Maybe others can chime in here too. This seems for me a central part for the parser in general: Find a clear, distinguishable syntax for additional features of the parser, which aren't available in TB and don't corrupt the interpretation.

Ideally this utilizes characters, which aren't available/used for TB and easy accessible on a PC keyboard. This is why I suggested the "{,},~" symbols for the parser. But I have to confess, that I not really have use for the supported scenario of (re-)reading original files 'back' to the PC. So yes, I regard the parser more or less like a 'compiler'.

I tough about that, but then I think that this would become like a BASIC compiler, and then a compiler seems more useful because the compiled programs would run faster than the interpreted ones.

No problem, but I don't promise on implementing them

Ok. So no features like changing SETCOLOR statements into more efficient POKE(s) or adding syntax sugar which transforms I++ to I=I+1...

Perhaps I could add an option to limit the names to only letters and numbers, but the program already has too many options.... What option name would be that? :)

Maybe (some) flags could be part of the source-file itself...? (like C pragmas)

Thank you again for the efforts you put into your converter!

MrMartian · September 6, 2015

I'm trying to compile this on FreeBSD, but I can't find software that compiles PEG. Do you know what the actual software is that you use on linux?

dmsc · September 6, 2015

Hi!,

I'm trying to compile this on FreeBSD, but I can't find software that compiles PEG. Do you know what the actual software is that you use on linux?

It is from here: http://piumarta.com/software/peg/

That produces two binaries, "peg" and "leg", only peg is necessary. You need to put it somewhere in your path.

dmsc · September 7, 2015

Hi!

A added support for extended strings with tags as @Irgendwer suggested and did a release in github, you can download binaries from:

https://github.com/dmsc/tbxl-parser/releases/tag/v2

Currently there are problems if you embed end-of-lines in strings, the resulting LIST file is not valid, but the produced BAS file is ok.

+MrFish · September 8, 2015

I corrected the array related problems mentioned above, and using the r13 binary that you compiled for Windows, my program went through without any errors, as expected, and worked correctly in TBXL. Thanks again for your help on that.

However when I took the same input program and stripped out all of the line numbers, it converted without any errors using the basicParser, but produced one line that resulted in a syntax error when the output program was entered in TBXL. The offending line has a "hanging" THEN at the end:

79PROCA0:CLS#Y:CL=0:IFB6>A0 ORBX>20:CM=48:CL=1:EL.:CM=R:END.:IFBX>26:CN=BX*1.2:EL.:CN=BX*1.4:END.:IFFRAC(CN)THEN

Here's the line of code from my program that corresponds with the IF/THEN portion of the above line:

IF FRAC (CellHeight) THEN CellHeight = INT (CellHeight) + %1

Running basicParser from the command-line, I used no parameters other than the file name of the input program.

I also tried outputting to a tokenized file with the same version of my program without the line numbers as input. It ran through basicParser with no errors as well, and ran for a short time in TBXL before locking up.

Edited September 8, 2015 by MrFish

dmsc · September 9, 2015

Hi!

I corrected the array related problems mentioned above, and using the r13 binary that you compiled for Windows, my program went through without any errors, as expected, and worked correctly in TBXL. Thanks again for your help on that.

However when I took the same input program and stripped out all of the line numbers, it converted without any errors using the basicParser, but produced one line that resulted in a syntax error when the output program was entered in TBXL. The offending line has a "hanging" THEN at the end:

79PROCA0:CLS#Y:CL=0:IFB6>A0 ORBX>20:CM=48:CL=1:EL.:CM=R:END.:IFBX>26:CN=BX*1.2:EL.:CN=BX*1.4:END.:IFFRAC(CN)THEN

Here's the line of code from my program that corresponds with the IF/THEN portion of the above line:

IF FRAC (CellHeight) THEN CellHeight = INT (CellHeight) + %1

Running basicParser from the command-line, I used no parameters other than the file name of the input program.

I also tried outputting to a tokenized file with the same version of my program without the line numbers as input. It ran through basicParser with no errors as well, and ran for a short time in TBXL before locking up.

That was hard.... the interpreting of IF/THEN statements is convoluted, because in no linenumbers mode, the end of the statements is marked only by the EOL, there is a difference between ":" and end of lines.

Mi parser solves this ambiguity by defining a statement called "INVISIBLE_ENDIF", and parses the line as:

IF FRAC( CellHeight ) THEN CellHeight = INT( CellHeight ) + %1 : INVISIBLE_ENDIF

I modified the line splitting code to count the THEN / INVISIBLE_ENDIF pairs and not split inside the pairs, plus always split after the last INVISIBLE_ENDIF.

The only problem was that code like this:

10 A=10
20 IF A=0 THEN 50:? "CHAO"
30 A=A-1
40 GOTO 20
50 ? "FIN"

This code is never executed in Atari Basic, but is , to solve that, I converted the extra statement after the number to a comment.

Attached is a new version with the improvements.

About the BAS version, I suspect various errors, but it is difficult to know without your sources. Perhaps you could try listing the program and see where the parser produced invalid code.

basicParser-v2-2-gc4ef22c-win32.zip

+MrFish · September 9, 2015

That was hard.... the interpreting of IF/THEN statements is convoluted, because in no linenumbers mode, the end of the statements is marked only by the EOL, there is a difference between ":" and end of lines.

Mi parser solves this ambiguity by defining a statement called "INVISIBLE_ENDIF", and parses the line as:
IF FRAC( CellHeight ) THEN CellHeight = INT( CellHeight ) + %1 : INVISIBLE_ENDIF
I modified the line splitting code to count the THEN / INVISIBLE_ENDIF pairs and not split inside the pairs, plus always split after the last INVISIBLE_ENDIF.

The only problem was that code like this:
10 A=10
20 IF A=0 THEN 50:? "CHAO"
30 A=A-1
40 GOTO 20
50 ? "FIN"
This code is never executed in Atari Basic, but is , to solve that, I converted the extra statement after the number to a comment.

Attached is a new version with the improvements.

About the BAS version, I suspect various errors, but it is difficult to know without your sources. Perhaps you could try listing the program and see where the parser produced invalid code.

I don't see any problem with the IF/THEN statements now, and I can ENTER the output file into TBXL without any syntax errors being flagged.

If I try to run the program after ENTERing it though, I get an:

"ERROR- 29 ?PROC AT LINE 0"

Checking through all procedure calls referenced on line 0, I find that one should be at line 157 (second to last line of the program). But when I try to list this line in TBXL it is not there ("LIST 157" produces nothing except the READY prompt).

I can list lines 155 and 158 (last line in the program, which only lists explicitly), which both show up properly (comparing to the contents of the output *.LST file opened/viewed on a PC text editor), but upon listing line 156 I get this (picture below), and TBXL is frozen. The line is correct, except for what looks like another line beginning to list, starting with a 0, a space, and then the cursor. I can recover by hitting system reset. Also if I try listing the whole program, it lists up until 156 and then ends the same way as listing 156 explicitly.

Edited September 9, 2015 by MrFish

dmsc · September 9, 2015

Hi!,

I don't see any problem with the IF/THEN statements now, and I can ENTER the output file into TBXL without any syntax errors being flagged.

If I try to run the program after ENTERing it though, I get an:

"ERROR- 29 ?PROC AT LINE 0"

Checking through all procedure calls referenced on line 0, I find that one should be at line 157 (second to last line of the program). But when I try to list this line in TBXL it is not there ("LIST 157" produces nothing except the READY prompt).

I can list lines 155 and 158 (last line in the program, which only lists explicitly), which both show up properly (comparing to the contents of the output *.LST file opened/viewed on a PC text editor), but upon listing line 156 I get this (picture below), and TBXL is frozen. The line is correct, except for what looks like another line beginning to list, starting with a 0, a space, and then the cursor. I can recover by hitting system reset. Also if I try listing the whole program, it lists up until 156 and then ends the same way as listing 156 explicitly.

This is probably a buffer overflow bug in the TBXL parser code. Can you post the lines 155 to 157 of the generated .LST? I will look at it later and try to understand where the overflow is.

Or, you can send me the full source, so I can test it myself.

My Basic Parsing and Transformation Tool

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members