[Expo-tech] Bad parsing of *include...; ref ...

Philip Sargent (Gmail) philip.sargent at gmail.com
Sun May 10 21:06:07 BST 2020


I have edited 264.svx to remove the redundant and erroneous ;ref comments.
and pushed to mercurial (:loser:). I can do this again to git obviously, if
you have
effectively frozen :loser: repo.

> *include sloppyseconds1	; ref 2017#01
> *include sloppyseconds2	; ref 2017#07
> *include dog_end_series	; ref 2017#42
> *include nothing2see		; ref 2017#Missing

I have put onto my to-do list fixing the bad regex use by parsers/survex.py

It's not being done now as this is best done by first changing all 
the ; ref statements to * ref statements in all the svx files and 
then stripping out a whole lot of unneeded code in parsers/survex.py

I have been doing my homework on the regex stuff:
https://imgur.com/lNI0VOv


-----Original Message-----
From: Philip Sargent (Gmail) [mailto:philip.sargent at gmail.com] 
Sent: 07 May 2020 20:01
To: expo-tech at lists.wookware.org
Subject: So that's what's special about these files.

I will be wanting to edit .svx files too. So this had better wait until
after
:loser: have been properly gitted.

strewth. 

regex_comment = re.compile(r"([^;]*?)\s*(?:;\s*(.*))?\n?$")
regex_star    = re.compile(r'\s*\*[\s,]*(\w+)\s*(.*?)\s*(?:;.*)?$')

Now in my book (link*) that's a just a recogniser. Not a proper parser. 
So no wonder it's buggy to buggery.

OK I'll get on with it. Some experimentation may be required.
First pass it looks like sline is not being separated from the comment
properly.
        sline, comment = regex_comment.match(svxline.strip()).groups()

But actually this is all mixed up with recognising "; ref " (96
occurrences).
So if I fix "; ref" -> "*ref" in 74 .svx files (mostly in 204), I can
remove the recogniser for "; ref"
in parsers/survex.py and everything becomes simpler.

In passing:
Since we actually have access to the survex source code, why didn't we
borrow the real parser ?
Philip

*
https://www.amazon.co.uk/dp/0201100886?tag=duc08-21&linkCode=osi&th=1   
(I bought mine in 1988 and read it cover to cover. I was keen in those
days.)

-----Original Message-----
From: Expo-tech [mailto:expo-tech-bounces at lists.wookware.org] On Behalf Of
Wookey
Sent: 07 May 2020 00:31
To: expo-tech at lists.wookware.org
Subject: Re: [Expo-tech] What is special about these files?

On 2020-05-06 23:54 +0100, Philip Sargent (Gmail) wrote:
> Is this illegal Survex format or is it troggle getting the parsing wrong
?
> 
> Troggle imports all the 264 .svx files except these:
>         dog_end_series
>         nothingtosee
>         sloppyseconds1
>         sloppyseconds2
> 
> and the distinguishing feature of these is that they have a 
> TRAILING COMMENT on the *include line with 3 or 4 tabs:
> 
> *include sloppyseconds1					; ref
2017#01
> *include sloppyseconds2					; ref
2017#07
> *include dog_end_series					; ref
2017#42
> *include nothing2see			; ref 2017#Missing
> 
> OK, so is this illegal Survex format or is it troggle getting the
parsing
> wrong ?

It is troggle getting the parsing wrong (comments are allowed on the
end of any line) The troggle parser does not implement all the things
that can be in a valid survex file by a large margin. Over the years
we have fixed enough of the things that it can read our dataset.

Fix the parser. 

> PS
> not in the *include list in 264.svx:
>         PitStopAll
>         TEMP-pre2018only-264
>         naturalhigh2
>         galactica
> 
> These are commented out in the *include list - why ?
>         imfunckintired
>         chamber90b

Probably because they are not connected or have been superceded. (but
I've not checked). Include them, proces the file and see what the
complaint is?

Wookey
-- 
Principal hats:  Linaro, Debian, Wookware, ARM
http://wookware.org/

_______________________________________________
Expo-tech mailing list
Expo-tech at lists.wookware.org
https://lists.wookware.org/cgi-bin/mailman/listinfo/expo-tech




More information about the Expo-tech mailing list