source: icGREP/icgrep-devel/QA/greptest.xml @ 4228

Last change on this file since 4228 was 4228, checked in by cameron, 5 years ago

Test cases to break icgrep

File size: 7.6 KB
Line 
1
2<greptest>
3<datafile id="simple1">
4A few lines of input
5in this simple test file
6provide fodder for some simple
7regexp tests.
8</datafile>
9
10<datafile id="bounded_charclass">
11=a;
12=bb;
13=ccc;
14=dddd;
15=eeeee;
16=ffffff;
17=ggggggg;
18=hhhhhhhh;
19=iiiiiiiii;
20=jjjjjjjjjj;
21=kkkkkkkkkkk;
22=llllllllllll;
23=mmmmmmmmmmmmm;
24=nnnnnnnnnnnnnn;
25=ooooooooooooooo;
26=pppppppppppppppp;
27=qqqqqqqqqqqqqqqqq;
28=rrrrrrrrrrrrrrrrrr;
29=sssssssssssssssssss;
30=tttttttttttttttttttt;
31=uuuuuuuuuuuuuuuuuuuuu;
32=vvvvvvvvvvvvvvvvvvvvvv;
33=wwwwwwwwwwwwwwwwwwwwwww;
34=xxxxxxxxxxxxxxxxxxxxxxxx;
35=yyyyyyyyyyyyyyyyyyyyyyyyy;
36=zzzzzzzzzzzzzzzzzzzzzzzzzz;
37</datafile>
38
39<datafile id="RangeAltSeqMatchStarKplusWhileNotOptAny">
40Dogbe hat ,/R Cat dt bt bt bt bt bat MzzzzzzzzT MaT MT McT MdT MeT M0T M1T M2T M3T M4T
41Dogbe hit foffasm zza " Dog Cat 1, 4= Dog ['zxcvbnm,./R Dog MT
42Dogbe hot foffasm czzb " MazazazTDogogogogog Cat 1, 4= Dog [;'zxcvbnm,./R Dogtp
43Dogbe foffasm dooooc MazT" Dog Cat 1, 4= Dog [Sqwertyuiopasdfghjkl;'zxcvbnm,./R Dog Cat
44Dogbe foffasm ezzzzzzzzzzzzzzt "tp Dog Cat 12, ktp 4= Dog [jkl;'zxcvbnm,./R Dogtp
45Dogbe foffasm zze " Dog CatMjT , = Dog [;'zxcvbzzznm,./R Dog MazazT cat
46zzcztpDogbe fofasm zazazz4z Doggg Cat 6, azzzzz= Dog [;'zxcvbonm,.R Dog TUT Dog
47Natatatats Nats T M0T ed bazbzczdzt et
48Dfg dc fog Nt ezt
49MazazazazazazazT
50</datafile>
51
52<datafile id="StartEndAlt">
53The ever-growing social networks and social media provide invaluable
54sources of information for modeling the behavior of users. High-quality
55user models enable superior services and functions for end users. In this
56talk, I will present several examples of user modeling based on social
57networks and social media. I will first describe our research in modeling
58users' information preferences on Microblogs using a novel user message
59model. I will then discuss our work on extracting users' daily activities,
60such as dining and shopping, that inherently reflect their habits, intents and preferences.
61I explain our novel transfer learning solution via a collaborative boosting
62framework comprising a text-to-activity classifier for socially connected users.
63I will also describe our research on user modeling in multiple, overlapping
64social networks in a 'composite social network' setting. I will show the benefits of
65modeling the dynamics of composite networks, where the evolution processes
66of different networks are jointly considered. Finally, I will explain our
67research on finding social spammers in large social networks.
68</datafile>
69
70<datafile id="special_characters">
71The ] character may appear as the first character inside character class
72expressions such as []>)].
73In this case, the ] character does not terminate the character class, but
74stands for itself.
75Similarly, the - character may appear as the first or last character
76in a character class expression, such as [-] or []-].  Occurring as the
77first or last character in a class means that it is a member of the
78class, instead of being interpreted as a range metacharacter.
79For both ] and -, occurrence as the first character could mean after
80an opening [^ mark for negated character class.   That is [^]] is the
81class that matches everything but ], while [^-] is the class that matches
82anything but -.
83----------
84The above line does not match [^-].
85----------
86]]]]]]]]]]
87^^^^^^^^^^
88</datafile>
89
90<datafile id="ips"> 
91201.250.180.213
92236.4.20.176
93137.96.194.126
94245.16.96.112
95245.19.58.43
96131.176.131.248
97248.160.22.214
98156.179.88.103
99174.13.62.156
100256.122.123.5
10116.81.78.152
102177.17.24.167
10332.120.25.23
104138.82.66.15
1054.196.8.251
106101.30.211.3
107209.44.105.129
10856.166.31.72
109247.108.224.170
110124.248.83.156
111113.107.178.250
112189.243.10.192
113184.18.189.31
11448.145.33.2
115188.137.131.244
11649.161.61.42
11714.31.211.138
11824.39.39.136
119146.217.131.80
120205.141.18.135
121159.207.166.206
12296.211.62.20
12323.148.44.140
124109.159.129.161
125183.230.172.129
12648.178.63.192
127224.41.190.207
128144.114.56.31
129151.205.132.247
130161.194.12.184
13187.55.69.195
132214.198.102.143
133173.19.17.220
134197.80.158.167
135121.94.119.11
136208.174.42.104
137124.173.96.31
138112.107.215.199
139162.30.140.121
140227.241.9.145
1416.26.111.203
142106.14.115.226
143107.233.237.60
144153.24.163.23
145197.4.54.55
146111.14.253.18
14743.138.139.15
148125.148.160.131
149173.16.80.24
15030.194.250.136
151173.233.196.71
152</datafile>
153
154<datafile id = "CRLF">line with CRLF &#13;&#10;two lines with LFCR &#10;&#13;final line
155</datafile>
156 <grepcase regexp="^$" datafile="CRLF" grepcount="1"/>
157 <grepcase regexp="^.*$" datafile="CRLF" grepcount="4"/>
158
159 <datafile id = "LU_test">
160The following line has LATIN CAPITAL LETTER G WITH MACRON in single quotes.
161'&#x1E20;'
162</datafile>
163
164<grepcase regexp="ab" datafile="StartEndAlt" grepcount="4"/>
165<grepcase regexp="a*b" datafile="StartEndAlt" grepcount="10"/>
166<grepcase regexp="ab*" datafile="StartEndAlt" grepcount="15"/>
167<grepcase regexp="^user|^I|our$" datafile="StartEndAlt" grepcount="5"/>
168
169<grepcase regexp="fe|si" datafile="simple1" grepcount="3"/>
170<grepcase regexp="in" datafile="simple1" grepcount="2"/>
171<grepcase regexp="[A-Z]" datafile="simple1" grepcount="1"/>
172<grepcase regexp="fodder|simple" datafile="simple1" grepcount="2"/>
173
174<grepcase regexp="[cde]{3}" datafile="bounded_charclass" grepcount="3"/>
175<grepcase regexp="[f-h]{5}" datafile="bounded_charclass" grepcount="3"/>
176<grepcase regexp="[a-z]{5}" datafile="bounded_charclass" grepcount="22"/>
177<grepcase regexp="[a-z]{5,15}" datafile="bounded_charclass" grepcount="22"/>
178<grepcase regexp="=[a-z]{7,}" datafile="bounded_charclass" grepcount="20"/>
179<grepcase regexp="=[a-z]{5,15};" datafile="bounded_charclass" grepcount="11"/>
180<grepcase regexp="[wxy]{2}{3}{2}" datafile="bounded_charclass" grepcount="3"/>
181<grepcase regexp="=([a-z][c-z])*;" datafile="bounded_charclass" grepcount="12"/>
182
183<grepcase regexp="^D[zabcdefoy]g" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="7"/>
184<grepcase regexp="do*c|ez*t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="4"/>
185<grepcase regexp="M(az)*T" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="6"/>         
186<grepcase regexp="ez+t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2" />
187<grepcase regexp="b([a-d]z)*t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
188<grepcase regexp="[^D]og" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
189<grepcase regexp="Na?t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
190<grepcase regexp="h.t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="3" />
191
192<grepcase regexp="[]]" datafile="special_characters" grepcount="9"/>
193<grepcase regexp="[-]" datafile="special_characters" grepcount="8"/>
194<grepcase regexp="[]^-]" datafile="special_characters" grepcount="14"/>
195<grepcase regexp="[\-\]\^]" datafile="special_characters" grepcount="14"/>
196<grepcase regexp="[^]]" datafile="special_characters" grepcount="16"/>
197<grepcase regexp="[^-]" datafile="special_characters" grepcount="15"/>
198<grepcase regexp="[^^]" datafile="special_characters" grepcount="16"/>
199<grepcase regexp="[^]-]" datafile="special_characters" grepcount="14"/>
200<grepcase regexp="[.]" datafile="special_characters" grepcount="7"/>
201
202<grepcase regexp="^((([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])[.]){3})([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])$" datafile="ips" grepcount="60"/>
203
204<!-- . should match a unique character, even if it is 3 bytes. -->
205<grepcase regexp="'.'" datafile="LU_test" grepcount="1"/>
206<grepcase regexp="'...'" datafile="LU_test" grepcount="0"/>
207<grepcase regexp="\u{1e20}" datafile="LU_test" grepcount="1"/>
208<grepcase regexp="\u{1e21}" datafile="LU_test" grepcount="0"/>
209<grepcase regexp="\p{Lu}" datafile="LU_test" grepcount="2"/>
210<grepcase regexp="'\p{Lu}'" datafile="LU_test" grepcount="1"/>
211<grepcase regexp="\p{Ll}" datafile="LU_test" grepcount="1"/>
212</greptest>
Note: See TracBrowser for help on using the repository browser.