Detecting Valid Number Strings in Python
Full title: Detecting Valid Number Strings in Python without Using RegEx or Throwing Exceptions
When writing down numbers for most day-to-day calculations, I find that there are typically four kinds of numbers:
Integers (0, 1, 2, 3, ...)
Decimal numbers (3.14, 0.1111, 9.5, etc.)
Negative numbers (-100, -2, -777, etc.)
Numbers that are both negative and decimal (-9.999, -0.12345, etc.)
How might we programmatically determine which numbers are valid numbers and which numbers are not? For example, "-3.14" is a valid number, but ".-314" is not.
For today let's ignore for fractions (eg. 1/3), irrational numbers (pi, Euler's number, etc.), scientific notation (with the E's), imaginary numbers, and complex numbers.
So, to detect if a string is indeed a valid number, what are our options?
We could use Regular Expressions ("RegEx") to determine if a string is a valid number, including the range of negative, positive, zero, decimal and integer numbers. But, RegEx's are a bit involved* for a novice programmer, so let's see if we can tackle this challenge without them.
*In other words, learning RegEx's would take substantially more work to learn, because they use an entirely different syntax than Python.
How about Python's built-in string functions?
Python has an "isdigit" function, but, it fails on decimal numbers and negative numbers.
print("5", "5".isdigit())
print("5.0", "5.0".isdigit()) # returns false because the period character is NOT a digit
print("543210", "543210".isdigit())
print("-5", "-5".isdigit()) # this returns false because the hyphen/dash character is NOT a digit
print("potato", "potato".isdigit())
The takeaway here is that the isdigit()
function will only return true if every single character in a string is a numeric character from 0 to 9.
How about Python's isnumeric()
print("5", "5".isnumeric())
print("5.0", "5.0".isnumeric())
print("543210", "543210".isnumeric())
print("-5", "-5".isnumeric())
print("potato", "potato".isnumeric())
It appears that Python's isnumeric()
function performs no better than Python's isdigit()
function, though I may be missing something here 🤷
I decide to roll my own function...
How about we try to determine (using our own custom code) whether or not a given string an actual valid number?
Some more ideas, thinking out loud:
1. is-valid-number? does the string contain only numbers, a max of one period that isn't the first or last character, and a max of one hyphen that only appears at the beginning of the string?
2. is-negative-number? does the string start with a hyphen (-)? If yes, it's a negative number. If not, it's either zero, or, a positive number.
3. is-decimal-number? does the string have just one period? If yes, it's a decimal number. If not, it's an integer.
Some potential bonus challenges for another time: is-fraction?, is-repeating-decimal-number?
Summarizing the challenge at hand
So, what makes a valid number valid? What patterns can we see in numbers that are valid versus strings that are not valid numbers?
Numbers never have letters (let's pretend for today that we aren't using scientific notation)
Numbers always begin with 0-9 (the numerical digits), or, with a negative sign (-) immediately followed by a numerical digit
Numbers can/may contain one period (.) that does not come at the end of the number or the beginning of the number
Some utility functions
Note: These utility functions were written after I wrote a good chunk of code already. Since I could see which code was getting repeated over and over, I had a clear idea of how I wanted to simplify and streamline the code into something more concise as well as clearer to read.
# filtering test in Python using the isdigit function to filter a string down to only its digits
filtered = filter(str.isdigit, "abc123")
This was my Google search: ...
... which led me here:
Also, I forgot that
is a string ("str") library function, so I ended up doing one more search for a code example here:
def is_period(input_char):
return input_char == "."
def is_hyphen(input_char):
return input_char == "-"
def is_zero(input_char):
return input_char == "0"
# I've decided to 're-alias' (i.e. save with a new name) the isdigit function to use in a more "syntactically consistent" way (as with is_period, is_hyphen, etc.) below
is_digit = str.isdigit
# returns a list of characters in a string which meet a given boolean condition (i.e. a "predicate" function)
def xs_in_string(pred, input_string):
return list(filter(pred, input_string))
# counts the number of characters in a string which, as above, "satisfies the predicate"
def count_xs(pred, input_string):
return len(xs_in_string(pred, input_string))
# testing out the utility functions to make sure they work as desired
print("periods count:", count_xs(is_period, "..."))
print("hyphens count:", count_xs(is_hyphen, "--12345"))
print("numbers count:", count_xs(is_digit, "--12345"))
print("zeroes count:", count_xs(is_zero, "0.0504"))
Okay! Get yourself ready for some reading and scrolling 😅 I decided to write the is_valid_number()
function in as "flat" of a manner (i.e. without deeply nesting if/else blocks) as I could in a single coding session before getting sleepy.*
*Near the end of coding this, I got a bit tired and less strict about avoiding nesting as I got closer to the end of the function. I hope to sit down in the not too distant future and refactor this to read a bit more clearly. Also, a proper doc-string would be a very helpful addition. I'd also love to convert my "print statement tests" into "proper" unit tests.
My custom is_valid_number()
def is_valid_number(input_string):
# case: "no input" (i.e. empty string)
# requirement: input_string must contain one or more characters
if(len(input_string) == 0):
# print("no input for input string '" + input_string + "'") # debugging
return False
# case: "bad input" (any non hyphen, non-period, non-digit characters)
# requirement: only digits, hyphens, and periods are allowed for the input_string to be a valid number
for char in input_string:
if ((char != "-") and (char != ".") and (not char.isdigit())):
# print("bad input '" + char + "' found in input string '" + input_string + "'") # debugging
return False
# scenarios: input_string has more than 1 hyphen OR if the hyphen is anywhere but in the first index
# - requirement: max 1 hyphen
# case: too many hyphens
if (count_xs(is_hyphen, input_string) > 1):
# print("too many hyphens detected in string '" + input_string + "'") # debugging
return False
# - requirement: hyphen, if it exists, is always first
# case: hyphens detected anywhere but the beginning
if ((count_xs(is_hyphen, input_string) == 1) and (input_string[0] != "-")):
# print("hyphen is not in the correct location for string '" + input_string + "'") # debugging
return False
# scenarios: more than 1 period OR period is first, last, or second after a hyphen (the period must be preceded *and* followed by at least one number)
# - req: max 1 period
# - req: period is NOT in the first index
# - req: period is NOT in the last index
# - req: period is NOT in the 2nd index IF input_string starts with a hyphen
# case: "insufficient valid input"
# if(len(list(filter(str.isdigit, input_string))) == 0): # pre-refactor
if (count_xs(is_digit, input_string) == 0): # post-refactor
# print("insufficient digits in input string '" + input_string + "'") # debugging
return False
# scenarios: periods in inappropriate places
# case: periods at the string's caps (beginning or end)
if((input_string[0] == ".") or (input_string[-1] == ".")):
# print("period found at head or tail or tail of string for input '" + input_string + "'") # debugging
return False
# case: too many periods
if (count_xs(is_period, input_string) > 1):
# print("too many periods found in input string '" + input_string + "'") # debugging
return False
# - case: a period just after a hyphen
if((input_string[0] == "-") and (input_string[1] == ".")):
# print("hyphen preceding a period detected in input '" + input_string + "'") # debugging
return False
# Q: How about 'numbers' like this? 00123 --> this is no good
# I've decided that leading zeroes are no good :P
# Q: how to detected multiple consecutive zeroes in the beginning of the number (with hyphen suffix or not)
# req: no consecutive leading zeroes
# case: has_a_leading_zero_preceding_a_non_period
if((len(input_string) > 1) and
(((input_string[0] == "0") and
(input_string[1] != ".")) # eg. "05" number starts with a zero and is followed by another digit (i.e. not a period)
or (len(input_string) > 2) and
((input_string[0] == "-") and
(input_string[1] == "0") and
(input_string[2] != ".")))): # eg. -01 number starts with a hyphen, followed by a zero, followed by another digit (i.e. not a period)
# print("leading zero preceding a non-period detected in string '" + input_string + "'") # debugging
return False
# trailing zeroes are OK (presumably for showing precision)
# Q: is negative zero an acceptable number? --> let's say no
# case: negative zero (integer or decimal number)
if((input_string[0] == "-") and (count_xs(is_digit, input_string) == count_xs(is_zero, input_string))):
# print("negative zero is not a valid number") # debugging
return False
# if we've reached this point, this means that we have a valid number, and we can now return True
return True
Bonus Functions (just stubs for now...)
# bonus functions
def is_negative_number(input_string):
# if valid number and number has a hyphen
def is_decimal_number(input_string):
# if valid number and number has a period
The Tests
print("5", is_valid_number("5")) # valid integer
print("5.0", is_valid_number("5.0")) # valid decimal number
print("543210", is_valid_number("543210")) # valid positive integer
print("-5", is_valid_number("-5")) # valid negative integer
print("0.8", is_valid_number("0.8")) # valid decimal number
print("0.0", is_valid_number("0.0")) # valid zero decimal number
print("0", is_valid_number("0")) # valid zero integer
print("-0.123", is_valid_number("-0.123")) # valid negative decimal number
print("'potato'", is_valid_number("potato")) # 'bad' data
print("'3xyz'", is_valid_number("3xyz"))
print("''", is_valid_number("")) # no data
print("'-'", is_valid_number("-")) # insufficient data
print("'-1 6'", is_valid_number("-1 6")) # has non-numeric, non-period, non-hyphen characters
print("2-", is_valid_number("2-"))
print("4-5", is_valid_number("4-5"))
print("-7-", is_valid_number("-7-"))
print(".35", is_valid_number(".35")) # has period at head
print("78.", is_valid_number("78.")) # has period at tail
print("1.2.3", is_valid_number("1.2.3")) # has too many periods
print("-0", is_valid_number("-0")) # negative zero is no good
print("-0.0", is_valid_number("-0.0")) # negative zero is no good
print("--3", is_valid_number("--3"))
print("9..9", is_valid_number("9..9"))
print("'.'", is_valid_number("."))
print("'-.'", is_valid_number("-."))
print("-.3", is_valid_number("-.3")) # hyphen followed immediately by a period is no good
print("000", is_valid_number("000"))
print("00.8", is_valid_number("00.8"))
print("-00.03", is_valid_number("-00.03"))
print("007", is_valid_number("007"))
print("050", is_valid_number("050"))
# bonus for those that read all the way here
# this code may perform similarly to the code I have written up above, though I have yet to test it:
# '3.14'.lstrip('-').replace('.','',1).isdigit()
# source:
# test cases that are out-of-scope for today:
# print("non-string number 5", is_valid_number(5)) # wrong data type
# print("½", is_valid_number("½")) # unicode numbers
# fractions such as 1/3