[Expo-tech] Validating image filenames much earlier in the process
Philip Sargent (Gmail)
philip.sargent at gmail.com
Wed Mar 25 13:02:36 GMT 2020
Sorry,
" ...About 600 *image file references inside* tunnel files are skipped..."
I really must try harder to get the names for the different survey artefacts
right.
USER NAMING
===========
We can see the roots of the problem if we look at where those whacky names
come from. When a caver scans their notes in the potato hut, there is no
default guidance *in the scanner app* on the expo laptop to make it easy for
them to be compliant.
(I think I changed the default scan format myself from PDF to JPG in 2018. )
We also need an earlier intervention to let the user know when they have
made a naming mistake. The moment they import a scan into Tunnel would be a
good time: can we set Preferences in Tunnel and Therion to produce a warning
if someone tries to use PNG, ODT, XCF or PDF ? Or uses any image filename
with capital letters ? That would catch images from mobile phones which seem
to like capital letters still.
Even if a caver persists with a bad name there is a real benefit if troggle
recognises and reports the filenames we don't want. Then we have a report
that we can go through and use for fixing them. At the moment it *quietly
ignores them* and doesn't even put an entry in the "Data Issues" log in the
database.
OTHER FILENAME ISSUES
===================
When troggle imports a tunnel file, it never looks into the referenced image
file. It is simply listed as a hot link in reports such as
http://expo.survex.com/tunneldata/
(which is broken in other ways too: it only shows image files referenced
from 161/161farnorthonlyB.xml yet troggle successfully added 3,096 scanned
images - or so it says )
So troggle itself doesn't care at all what format the files are. But it
should also be noticing and recording the references to the .top and the
.th2 files as well as the .jpg
Philip
-----Original Message-----
From: Expo-tech [mailto:expo-tech-bounces at lists.wookware.org] On Behalf Of
Wookey
Sent: 25 March 2020 01:56
To: expo-tech at lists.wookware.org
Subject: Re: [Expo-tech] Fixing /2014#05pidgeondroppings3/ to /2014#05/
etc.etc.
On 2020-03-25 00:38 -0000, Philip Sargent (Gmail) wrote:
[well done for fixing up the 2014 situation properly]
> PPS About 600 tunnel files are skipped by the troggle import process
because they do not obey the naming convention "notes1.jpg" etc.
You don't mean 'tunnel' files here, I hope. We are talking about scans of
notes. Tunnel files can be called whatever they like.
> I will have a go at relaxing that in troggle/parsers/surveys.py and ensure
that the 3 things: wallets.py, troggle and the handbook documentation all
say the same thing.
I'm not very keen on relaxing the convention. People will still call them
all sorts of things, with random capitalisation, and they'll be PDFs and
PNGs and ODTs and XCFs and whatever other craziness people think of
(potentially reduced by better docs but it won't go away). So there will
always be a renaming and conversion job to do. In general better quality
software and data comes from being strict about formats, not lax. We have
learned this over many years.
Just document the way it's supposed to be done, and don't accept anything
else. I don't think we have much to gain from accepting some variations but
not others. It's not too bad so long as it is only ever parsed in one place
and one language, but as soon as something else is looking at it you have
the potential for breakage (where some subset of filenames are accepted in
one place, but not the other).
Wookey
--
Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/
More information about the Expo-tech
mailing list