各种语言的注释,在处理代码时,都需要进行移除。
一些简单的办法就是使用字符串不断匹配,然后移除。
稍微能上到理论层面的,能落地的文章,就是 编译原理删除C/C++代码中的所有注释 。后来使用Python重新改写了一遍。在使用中发现不能满足要求,比如[特殊注释]的语法不支持,就新增进去;还有int a = -1;这类的注释,需要把初始值-1去掉,也在代码里面新增了。
使用状态机来处理注释移除问题,可以把此类问题完美解决,扩张性也非常强大。相比于lex + yacc 方案,入门和调整更便利。
备注:支持包含中文的注释。
#coding=utf-8
# quote:
class rmcmnt :
### members
m_type = 'CPP'
### contructed function
def __init__(self, type):
m_type = type
### member function
# remove code comments
def removecomment(seft, strInput) :
state = 0;
strOutput = ''
strRemoved = ''
for c in strInput :
if state == 0 and c == '/' : # ex. [/]
state = 1
elif state == 1 and c == '*' : # ex. [/*]
state = 2
elif state == 1 and c == '/' : # ex. [#]
state = 4
elif state == 1 : # ex. [<secure/_stdio.h> or 5/3]
print('/')
state = 0
elif state == 3 and c == '*': # ex. [/*he**]
state = 3
elif state == 2 and c == '*': # ex. [/*he*]
state = 3
elif state == 2: # ex. [/*heh]
state = 2
elif state == 3 and c == '/': # ex. [/*heh*/]
state = 0
elif state == 3: # ex. [/*heh*e]
state = 2
elif state == 4 and c == '\\': # ex. [//hehe\]
state = 9
elif state == 9 and c == '\\': # ex. [//hehe\\\\\]
state = 9
elif state == 9: # ex. [//hehe\<enter> or //hehe\a]
state = 4
elif state == 4 and c == '\n': # ex. [//hehe<enter>]
state = 0
elif state == 0 and c == '\'': # ex. [']
state = 5
elif state == 5 and c == '\\': # ex. ['\]
state = 6
elif state == 6: # ex. ['\n or '\' or '\t etc.]
state = 5
elif state == 5 and c == '\'': # ex. ['\n' or '\'' or '\t' ect.]
state = 0
elif state == 0 and c == '\"': # ex. ["]
state = 7
elif state == 7 and c == '\\': # ex. ["\]
state = 8
elif state == 8: # ex. ["\n or "\" or "\t ect.]
state = 7
elif state == 7 and c == '\"': # ex. ["\n" or "\"" or "\t" ect.]
state = 0
### new request
# []
elif state == 0 and c == '[': # ex. [[]
state = 10
elif state == 10 and c == ']': # ex. []]
state = 11
# [[]]
elif state == 10 and c == '[': # ex. []]
state = 12
elif state == 12 and c == ']': # ex. [[]
state = 13
elif state == 13 and c == ']': # ex. [[]
state = 14
# remove character in []
elif state == 10:
state = 10
# remove character in [[]]
elif state == 12:
state = 12
elif state == 13:
state = 13
# restore state
elif state == 11:
state = 0
elif state == 14:
state = 0
elif state == 11 and c == ']':
state = 14
elif state == 1 and c == ']':
state = 13
elif state == 10:
state = 10
elif state == 12:
state = 12
elif state == 11:
state = 13
elif state == 12:
state = 0
elif state == 13:
state = 0
# remove "=-1" in "int a = -1;"
elif state == 0 and c == '=':
state = 15
elif state == 15 and c == ';':
state = 0
if (state == 0 and c != '/') or state == 5 or\
state == 6 or state == 7 or state == 8 :
strOutput += c
else:
# removed chareters
strRemoved += c
return strOutput
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!