Monday, 19 November 2018

Coordinates (2): way of regular expressions

In the second part I will show how to check if latitude or longitude is valid using regular expressions.

First, let's look at example of latitude and longitude in DMSH format, space delimited:
                      52 22 33.47N 015 42 33.44E
We can write is in general form as:
                       DD MM SS.ss H DDD MM SS.ss H
and the following must be true:
  • DD is between <-90, 90> for latitude and <-180, 180> for longitude
  • MM is between <0, 59>
  • SS.ss is between equal or grater than 0 and less then 60
and in general format:
                      ddsddsdd.ddL dddsddsdd.ddL
where d is digit, s is space and L is hemisphere letter (N, S, E, W).

 We will take into account that latitude can be without 'leading zeros' in our further deliberations:
                       1 1 1.44N instead of 01 01 01.44N
There are two options here:
1. To write quite complex regular expression that will take into account all combinations of deg, min, sec e. g.:
  • for latitude: if deg is equal 90, min, sec must be equal 0 (90 05 05.55N is not valid latitude)
  • for latitude: if deg is between  <-90, 90> min might take integer values <0, 59>, sec integer values <0, 59> or float values <0, 60)
2. To write much simpler regular expression that will take into account only general conditions, and if input matches regular expression check if deg, min, sec meet further conditions.

Regular expression to match coordinate in DMSH format, space separated looks like:

regex_dmsh = re.compile(r'''(?P<deg>^\d{1,3})  # Degrees
                            (\s)
                            (?P<min>\d{1,2})  # Minutes
                            (\s)
                            (?P<sec>\d{1,2}(\.\d+)?)  # Seconds
                            (?P<hem>[NSEW]$)
                         ''', re.VERBOSE)

I used groups to get easy access to deg, min, sec and hemisphere letter parts.

Now we can write function that converts coordinate given in dms format into dd format.
Function will take two arguments:
  • regex_pattern: regular expression,  pattern of dms format
  • dms: string, latitude or longitude to be converted into dd format
def dms2dd(regex_pattern, dms):

The first step is to check if input dms matches pattern and get hemisphere, deg, min, sec values:

if regex_pattern.match(dms):
    groups = regex_pattern.search(dms)
    h = groups.group('hem')
    d = float(groups.group('deg'))
    m = float(groups.group('min'))
    s = float(groups.group('sec'))
    # Check h, d, m, s conditions here
else:
    return None

Next step is to check conditions for latitude and longitude discussed above:
For latitude:

if h in ['N', 'S']:
    if d > 90:  # Latitude is in range <-90, 90>
        return None
    elif d == 90 and (m > 0 or s > 0):
        return None
    else:
        if m >= 60 or s >= 60:
            return None
        else:
            dd = d + m / 60 + s / 3600
            if h == 'S':
                dd = -dd
            return dd

For longitude:

elif h in ['E', 'W']:
    if d > 180:  # Longitude is in range <-180, 180>
        return None
    elif d == 180 and (m > 0 or s > 0):
        return None
    else:
        if m >= 60 or s >= 60:
            return None
        else:
            dd = d + m / 60 + s / 3600
            if h == 'W':
                dd = -dd
            return dd

All source code can be found here:https://github.com/strpaw/python_examples/blob/master/latlon_basic_regex_parser.py

Let's test if our code is correct:
Some test data (list of tuples, one tuple is coordinate lat, lon pair):

test_coordinates = [('77 01 01.11N', '015 15 17.43E'),
                    ('77 01 01N', '015 15 17E'),
                    ('77-01-01.11N', '015-15-17.43E'),
                    (77.43333, 15.3336),
                    ('77 50 47S', '166 40 06W'),
                    ('77 5 7.1S', '166 4 6.45555W'),
                    ('7 5 17.1S', '001 4 06.45555W'),
                    ('77 5 7.1', '166 4 6.45555'),
                    ('0 0 0N', '000 00 00.000E'),
                    ('97 5 7.1S', '180 4 6.45555W'),
                    ('89 59 59.999S', '179 59 59.999E')]

And a bit of code to check if dms2dd() can convert input into dd if coordinate is DMSH space separated format:

print('Test dms2dd - input DMS space delimited, e.g. 78 12 24.56N')
print('-' * 60)
print('{:^15} {:^15} | {:^24}'.format('Lat test', 'Lon test', 'dms2dd result'))
print('-' * 60)
for test_coord in test_coordinates:
    lat_dd = dms2dd(regex_dmsh, str(test_coord[0]))
    lon_dd = dms2dd(regex_dmsh, str(test_coord[1]))

    if lat_dd is None:
        lat_result = 'None'
    else:
        lat_result = '{:12.7f}'.format(lat_dd)

    if lon_dd is None:
        lon_result = 'None'
    else:
        lon_result = '{:12.7f}'.format(lon_dd)

    print('{:>15} {:>15} | {:>12} {:>12}'.format(test_coord[0], test_coord[1], lat_result, lon_result))

the output is:

Test dms2dd - input DMS space delimited, e.g. 78 12 24.56N------------------------------------------------------------
   Lat test        Lon test     |      dms2dd result     
------------------------------------------------------------
   77 01 01.11N   015 15 17.43E |   77.0169750   15.2548417
      77 01 01N      015 15 17E |   77.0169444   15.2547222
   77-01-01.11N   015-15-17.43E |         None         None
       77.43333         15.3336 |         None         None
      77 50 47S      166 40 06W |  -77.8463889 -166.6683333
      77 5 7.1S  166 4 6.45555W |  -77.0853056 -166.0684599
      7 5 17.1S 001 4 06.45555W |   -7.0880833   -1.0684599
       77 5 7.1   166 4 6.45555 |         None         None
         0 0 0N  000 00 00.000E |    0.0000000    0.0000000
      97 5 7.1S  180 4 6.45555W |         None         None
  89 59 59.999S  179 59 59.999E |  -89.9999997  179.9999997

and seems that function works fine.

No comments:

Post a Comment