Skip to content

New rexEx errors seen in objdump2itb script

In mk/Common.mk there is a target that invokes the bin/objdump2itb script. This seems to run fine under Ubuntu 22.04, but after recently updating to Ubuntu 24.04 (with Python 3.12.3), I see the following:

/opt/toolchains/corev-openhw-gcc-ubuntu2204-20240530/bin/riscv32-corev-elf-objdump \
   	-d \
       -S \
   -M no-aliases \
   -M numeric \
       -l \
   /home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/sim/uvmt/dsim_results/default/hello-world/0/test_program/hello-world.elf | /home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/bin/objdump2itb - > /home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/sim/uvmt/dsim_results/default/hello-world/0/test_program/hello-world.itb
/home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/bin/objdump2itb:77: SyntaxWarning: invalid escape sequence '\S'
 FUNC_PATTERN     = "(?P<addr>[0-9a-f]{8}) <(?P<name>\S*)>:"
/home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/bin/objdump2itb:83: SyntaxWarning: invalid escape sequence '\s'
 INST_PATTERN     = "(?P<addr>[0-9a-f]{1,8}):\t*(?P<mcode>[0-9a-f]{4}([0-9a-f]{4})?)\s{2,}(?P<asm>[a-z].*)$"
/home/mike/GitHubRepos/openhwgroup/cv32e20-dv/master/bin/objdump2itb:89: SyntaxWarning: invalid escape sequence '\S'
 SRC_FILE_PATTERN = "^(?P<dir>/\S+)/(?P<file>[^/\s]+):(?P<line>[0-9]*)$"

As indicated by the error message, the problem is a set of three regular expressions on lines 77, 83 and 89. In all cases the SyntaxWarning: invalid escape sequence '\S' error message indicates that a backslash \ followed by the character S or s that Python does not recognize as a valid escape sequence.

I have no idea why this wasn't a problem with earlier versions of Ubuntu and/or Python3.

In a local workspace, I made the following changes:

diff --git a/bin/objdump2itb b/bin/objdump2itb
index 072397e2..ef479af8 100755
--- a/bin/objdump2itb
+++ b/bin/objdump2itb
@@ -74,19 +74,22 @@ class CFunction:
 # Regular expression to extract a function from objdump
 # Example:
 # 00000256 <end_handler_incr_mepc>:
-FUNC_PATTERN     = "(?P<addr>[0-9a-f]{8}) <(?P<name>\S*)>:"
+# Note: raw string to treat \S literally for regex
+FUNC_PATTERN     = r"(?P<addr>[0-9a-f]{8}) <(?P<name>\S*)>:"
 FUNC_RE          = re.compile(FUNC_PATTERN)
 
 # Regular expression to extract an individual instruction
 # Example:
 #      264:       00a31363                bne     t1,a0,26a <end_handler_incr_mepc2>
-INST_PATTERN     = "(?P<addr>[0-9a-f]{1,8}):\t*(?P<mcode>[0-9a-f]{4}([0-9a-f]{4})?)\s{2,}(?P<asm>[a-z].*)$"
+# Note: raw string to treat \s literally for regex
+INST_PATTERN     = r"(?P<addr>[0-9a-f]{1,8}):\t*(?P<mcode>[0-9a-f]{4}([0-9a-f]{4})?)\s{2,}(?P<asm>[a-z].*)$"
 INST_RE          = re.compile(INST_PATTERN)
 
 # Regular expression to extract a source annotation for each instruction
 # Example:
 # /work/strichmo/core-v-verif/cv32e40x/tests/programs/custom/debug_test_trigger/debugger.S:47
-SRC_FILE_PATTERN = "^(?P<dir>/\S+)/(?P<file>[^/\s]+):(?P<line>[0-9]*)$" 
+# Note: raw string to treat \S and \s literally for regex
+SRC_FILE_PATTERN = r"^(?P<dir>/\S+)/(?P<file>[^/\s]+):(?P<line>[0-9]*)$" 
 SRC_FILE_RE      = re.compile(SRC_FILE_PATTERN)
 
 parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)

This resolves the error messages, and an itb file is written out as expected, but I have no idea whether the generated itb file is correct.