Detecting Valid Number Strings in Python

Full title: Detecting Valid Number Strings in Python without Using RegEx or Throwing Exceptions

When writing down numbers for most day-to-day calculations, I find that there are typically four kinds of numbers:

  • Integers (0, 1, 2, 3, ...)

  • Decimal numbers (3.14, 0.1111, 9.5, etc.)

  • Negative numbers (-100, -2, -777, etc.)

  • Numbers that are both negative and decimal (-9.999, -0.12345, etc.)

How might we programmatically determine which numbers are valid numbers and which numbers are not? For example, "-3.14" is a valid number, but ".-314" is not.

For today let's ignore for fractions (eg. 1/3), irrational numbers (pi, Euler's number, etc.), scientific notation (with the E's), imaginary numbers, and complex numbers.

So, to detect if a string is indeed a valid number, what are our options?

  • We could use Regular Expressions ("RegEx") to determine if a string is a valid number, including the range of negative, positive, zero, decimal and integer numbers. But, RegEx's are a bit involved* for a novice programmer, so let's see if we can tackle this challenge without them.

*In other words, learning RegEx's would take substantially more work to learn, because they use an entirely different syntax than Python.

How about Python's built-in string functions?

  • Python has an "isdigit" function, but, it fails on decimal numbers and negative numbers.

print("5", "5".isdigit())
print("5.0", "5.0".isdigit()) # returns false because the period character is NOT a digit
print("543210", "543210".isdigit())
print("-5", "-5".isdigit()) # this returns false because the hyphen/dash character is NOT a digit
print("potato", "potato".isdigit())
0.5s

The takeaway here is that the isdigit() function will only return true if every single character in a string is a numeric character from 0 to 9.

How about Python's isnumeric() function?

print("5", "5".isnumeric())
print("5.0", "5.0".isnumeric())
print("543210", "543210".isnumeric())
print("-5", "-5".isnumeric())
print("potato", "potato".isnumeric())
0.3s

It appears that Python's isnumeric() function performs no better than Python's isdigit() function, though I may be missing something here 🤷

I decide to roll my own function...

How about we try to determine (using our own custom code) whether or not a given string an actual valid number?

Some more ideas, thinking out loud:

1. is-valid-number? does the string contain only numbers, a max of one period that isn't the first or last character, and a max of one hyphen that only appears at the beginning of the string?

2. is-negative-number? does the string start with a hyphen (-)? If yes, it's a negative number. If not, it's either zero, or, a positive number.

3. is-decimal-number? does the string have just one period? If yes, it's a decimal number. If not, it's an integer.

Some potential bonus challenges for another time: is-fraction?, is-repeating-decimal-number?

Summarizing the challenge at hand

So, what makes a valid number valid? What patterns can we see in numbers that are valid versus strings that are not valid numbers?

  • Numbers never have letters (let's pretend for today that we aren't using scientific notation)

  • Numbers always begin with 0-9 (the numerical digits), or, with a negative sign (-) immediately followed by a numerical digit

  • Numbers can/may contain one period (.) that does not come at the end of the number or the beginning of the number

Some utility functions

Note: These utility functions were written after I wrote a good chunk of code already. Since I could see which code was getting repeated over and over, I had a clear idea of how I wanted to simplify and streamline the code into something more concise as well as clearer to read.

# filtering test in Python using the isdigit function to filter a string down to only its digits
filtered = filter(str.isdigit, "abc123")
print(list(filtered))
0.3s
def is_period(input_char):
  return input_char == "."
def is_hyphen(input_char):
  return input_char == "-"
def is_zero(input_char):
  return input_char == "0"
0.0s
# I've decided to 're-alias' (i.e. save with a new name) the isdigit function to use in a more "syntactically consistent" way (as with is_period, is_hyphen, etc.) below
is_digit = str.isdigit
0.0s
# returns a list of characters in a string which meet a given boolean condition (i.e. a "predicate" function)
def xs_in_string(pred, input_string):
  return list(filter(pred, input_string))
# counts the number of characters in a string which, as above, "satisfies the predicate"
def count_xs(pred, input_string):
  return len(xs_in_string(pred, input_string))
0.0s
# testing out the utility functions to make sure they work as desired
print("periods count:", count_xs(is_period, "..."))
print("hyphens count:", count_xs(is_hyphen, "--12345"))
print("numbers count:", count_xs(is_digit, "--12345"))
print("zeroes count:", count_xs(is_zero, "0.0504"))
0.3s

Okay! Get yourself ready for some reading and scrolling 😅 I decided to write the is_valid_number() function in as "flat" of a manner (i.e. without deeply nesting if/else blocks) as I could in a single coding session before getting sleepy.*

*Near the end of coding this, I got a bit tired and less strict about avoiding nesting as I got closer to the end of the function. I hope to sit down in the not too distant future and refactor this to read a bit more clearly. Also, a proper doc-string would be a very helpful addition. I'd also love to convert my "print statement tests" into "proper" unit tests.

My custom is_valid_number() function

def is_valid_number(input_string):
  
  # case: "no input" (i.e. empty string)
  # requirement: input_string must contain one or more characters
  if(len(input_string) == 0):
    # print("no input for input string '" + input_string + "'") # debugging
    return False
  
  # case: "bad input" (any non hyphen, non-period, non-digit characters)
  # requirement: only digits, hyphens, and periods are allowed for the input_string to be a valid number
  for char in input_string:
    if ((char != "-") and (char != ".") and (not char.isdigit())):
      # print("bad input '" + char + "' found in input string '" + input_string + "'") # debugging
      return False
    
  # scenarios: input_string has more than 1 hyphen OR if the hyphen is anywhere but in the first index
  # - requirement: max 1 hyphen
  # case: too many hyphens
  if (count_xs(is_hyphen, input_string) > 1):
    # print("too many hyphens detected in string '" + input_string + "'") # debugging
    return False
  
  # - requirement: hyphen, if it exists, is always first
  # case: hyphens detected anywhere but the beginning
  if ((count_xs(is_hyphen, input_string) == 1) and (input_string[0] != "-")):
    # print("hyphen is not in the correct location for string '" + input_string + "'") # debugging
    return False
  
  # scenarios: more than 1 period OR period is first, last, or second after a hyphen (the period must be preceded *and* followed by at least one number)
  # - req: max 1 period
  # - req: period is NOT in the first index
  # - req: period is NOT in the last index
  # - req: period is NOT in the 2nd index IF input_string starts with a hyphen
  
  # case: "insufficient valid input"
  # if(len(list(filter(str.isdigit, input_string))) == 0): # pre-refactor
  if (count_xs(is_digit, input_string) == 0): # post-refactor
    # print("insufficient digits in input string '" + input_string + "'") # debugging
    return False
  
  # scenarios: periods in inappropriate places
  # case: periods at the string's caps (beginning or end)
  if((input_string[0] == ".") or (input_string[-1] == ".")):
    # print("period found at head or tail or tail of string for input '" + input_string + "'") # debugging
    return False
  
  # case: too many periods
  if (count_xs(is_period, input_string) > 1):
    # print("too many periods found in input string '" + input_string + "'") # debugging
    return False
  
  # - case: a period just after a hyphen
  if((input_string[0] == "-") and (input_string[1] == ".")):
    # print("hyphen preceding a period detected in input '" + input_string + "'") # debugging
    return False
  
  # Q: How about 'numbers' like this? 00123 --> this is no good
  # I've decided that leading zeroes are no good :P
  # Q: how to detected multiple consecutive zeroes in the beginning of the number (with hyphen suffix or not)
  
  # req: no consecutive leading zeroes
  # case: has_a_leading_zero_preceding_a_non_period
  if((len(input_string) > 1) and 
     (((input_string[0] == "0") and
       (input_string[1] != ".")) # eg. "05" number starts with a zero and is followed by another digit (i.e. not a period)
      or (len(input_string) > 2) and 
      ((input_string[0] == "-") and
       (input_string[1] == "0") and 
       (input_string[2] != ".")))): # eg. -01 number starts with a hyphen, followed by a zero, followed by another digit (i.e. not a period)
    # print("leading zero preceding a non-period detected in string '" + input_string + "'") # debugging
    return False
  
  # trailing zeroes are OK (presumably for showing precision)
  # Q: is negative zero an acceptable number? --> let's say no
  # case: negative zero (integer or decimal number)
  if((input_string[0] == "-") and (count_xs(is_digit, input_string) == count_xs(is_zero, input_string))):
    # print("negative zero is not a valid number") # debugging
    return False
  
  # if we've reached this point, this means that we have a valid number, and we can now return True
  return True
0.1s

Bonus Functions (just stubs for now...)

# bonus functions
def is_negative_number(input_string):
  # if valid number and number has a hyphen
  pass
def is_decimal_number(input_string):
  # if valid number and number has a period
  pass
0.0s

The Tests

print("5", is_valid_number("5")) # valid integer
print("5.0", is_valid_number("5.0")) # valid decimal number
print("543210", is_valid_number("543210")) # valid positive integer
print("-5", is_valid_number("-5")) # valid negative integer
print("0.8", is_valid_number("0.8")) # valid decimal number
print("0.0", is_valid_number("0.0")) # valid zero decimal number
print("0", is_valid_number("0")) # valid zero integer
print("-0.123", is_valid_number("-0.123")) # valid negative decimal number
0.3s
print("'potato'", is_valid_number("potato")) # 'bad' data
print("'3xyz'", is_valid_number("3xyz"))
print("''", is_valid_number("")) # no data
print("'-'", is_valid_number("-")) # insufficient data
0.3s
print("'-1 6'", is_valid_number("-1 6")) # has non-numeric, non-period, non-hyphen characters
0.3s
print("2-", is_valid_number("2-"))
print("4-5", is_valid_number("4-5"))
print("-7-", is_valid_number("-7-"))
0.2s
print(".35", is_valid_number(".35")) # has period at head
print("78.", is_valid_number("78.")) # has period at tail
print("1.2.3", is_valid_number("1.2.3")) # has too many periods
0.3s
print("-0", is_valid_number("-0")) # negative zero is no good
print("-0.0", is_valid_number("-0.0")) # negative zero is no good
0.2s
print("--3", is_valid_number("--3"))
print("9..9", is_valid_number("9..9"))
print("'.'", is_valid_number("."))
print("'-.'", is_valid_number("-."))
print("-.3", is_valid_number("-.3")) # hyphen followed immediately by a period is no good
0.2s
print("000", is_valid_number("000"))
print("00.8", is_valid_number("00.8"))
print("-00.03", is_valid_number("-00.03"))
print("007", is_valid_number("007"))
print("050", is_valid_number("050"))
0.2s
# bonus for those that read all the way here
# this code may perform similarly to the code I have written up above, though I have yet to test it:
# '3.14'.lstrip('-').replace('.','',1).isdigit()
# source: https://stackoverflow.com/questions/354038/how-do-i-check-if-a-string-represents-a-number-float-or-int
0.0s
# test cases that are out-of-scope for today:
# print("non-string number 5", is_valid_number(5)) # wrong data type
# print("½", is_valid_number("½")) # unicode numbers
# fractions such as 1/3
0.0s
Runtimes (1)