[Expo-tech] cracked regex
Philip Sargent (Gmail)
philip.sargent at gmail.com
Wed May 13 19:38:50 BST 2020
Hah. In your face regex:
regex_starref =
re.compile(r'^\s*\*ref[\s.:]*((?:19[6789]\d)|(?:20[0123]\d))\s*#?\s*(X)?\s*(
.*?\d+.*?)$(?i)')
I have beaten the monster.
Thanks to Wookey for telling me about https://regexr.com/ for online testing
of regexes.
Philip
I have been doing my homework on the regex stuff:
https://imgur.com/lNI0VOv
-----Original Message-----
From: Philip Sargent (Gmail) [mailto:philip.sargent at gmail.com]
Sent: 07 May 2020 20:01
To: expo-tech at lists.wookware.org
Subject: So that's what's special about these files.
I will be wanting to edit .svx files too. So this had better wait until
after
:loser: have been properly gitted.
strewth.
regex_comment = re.compile(r"([^;]*?)\s*(?:;\s*(.*))?\n?$")
regex_star = re.compile(r'\s*\*[\s,]*(\w+)\s*(.*?)\s*(?:;.*)?$')
Now in my book (link*) that's a just a recogniser. Not a proper parser.
So no wonder it's buggy to buggery.
OK I'll get on with it. Some experimentation may be required.
First pass it looks like sline is not being separated from the comment
properly.
sline, comment = regex_comment.match(svxline.strip()).groups()
But actually this is all mixed up with recognising "; ref " (96
occurrences).
So if I fix "; ref" -> "*ref" in 74 .svx files (mostly in 204), I can
remove the recogniser for "; ref"
in parsers/survex.py and everything becomes simpler.
In passing:
Since we actually have access to the survex source code, why didn't we
borrow the real parser ?
Philip
*
https://www.amazon.co.uk/dp/0201100886?tag=duc08-21&linkCode=osi&th=1
(I bought mine in 1988 and read it cover to cover. I was keen in those
days.)
-----Original Message-----
From: Expo-tech [mailto:expo-tech-bounces at lists.wookware.org] On Behalf Of
Wookey
Sent: 07 May 2020 00:31
To: expo-tech at lists.wookware.org
Subject: Re: [Expo-tech] What is special about these files?
On 2020-05-06 23:54 +0100, Philip Sargent (Gmail) wrote:
> Is this illegal Survex format or is it troggle getting the parsing wrong
?
>
> Troggle imports all the 264 .svx files except these:
> dog_end_series
> nothingtosee
> sloppyseconds1
> sloppyseconds2
>
> and the distinguishing feature of these is that they have a
> TRAILING COMMENT on the *include line with 3 or 4 tabs:
>
> *include sloppyseconds1 ; ref
2017#01
> *include sloppyseconds2 ; ref
2017#07
> *include dog_end_series ; ref
2017#42
> *include nothing2see ; ref 2017#Missing
>
> OK, so is this illegal Survex format or is it troggle getting the
parsing
> wrong ?
It is troggle getting the parsing wrong (comments are allowed on the
end of any line) The troggle parser does not implement all the things
that can be in a valid survex file by a large margin. Over the years
we have fixed enough of the things that it can read our dataset.
Fix the parser.
> PS
> not in the *include list in 264.svx:
> PitStopAll
> TEMP-pre2018only-264
> naturalhigh2
> galactica
>
> These are commented out in the *include list - why ?
> imfunckintired
> chamber90b
Probably because they are not connected or have been superceded. (but
I've not checked). Include them, proces the file and see what the
complaint is?
Wookey
--
Principal hats: Linaro, Debian, Wookware, ARM
http://wookware.org/
_______________________________________________
Expo-tech mailing list
Expo-tech at lists.wookware.org
https://lists.wookware.org/cgi-bin/mailman/listinfo/expo-tech
More information about the Expo-tech
mailing list