source: icGREP/icgrep-devel/QA/greptest.xml @ 4269

Last change on this file since 4269 was 4269, checked in by nmedfort, 5 years ago

Couple new test cases. Potential bug: icgrep reports 97 for emails; egrep reports 116.

File size: 12.3 KB
Line 
1
2<greptest>
3<datafile id="simple1">
4A few lines of input
5in this simple test file
6provide fodder for some simple
7regexp tests.
8</datafile>
9
10<datafile id="bounded_charclass">
11=a;
12=bb;
13=ccc;
14=dddd;
15=eeeee;
16=ffffff;
17=ggggggg;
18=hhhhhhhh;
19=iiiiiiiii;
20=jjjjjjjjjj;
21=kkkkkkkkkkk;
22=llllllllllll;
23=mmmmmmmmmmmmm;
24=nnnnnnnnnnnnnn;
25=ooooooooooooooo;
26=pppppppppppppppp;
27=qqqqqqqqqqqqqqqqq;
28=rrrrrrrrrrrrrrrrrr;
29=sssssssssssssssssss;
30=tttttttttttttttttttt;
31=uuuuuuuuuuuuuuuuuuuuu;
32=vvvvvvvvvvvvvvvvvvvvvv;
33=wwwwwwwwwwwwwwwwwwwwwww;
34=xxxxxxxxxxxxxxxxxxxxxxxx;
35=yyyyyyyyyyyyyyyyyyyyyyyyy;
36=zzzzzzzzzzzzzzzzzzzzzzzzzz;
37</datafile>
38
39<datafile id="RangeAltSeqMatchStarKplusWhileNotOptAny">
40Dogbe hat ,/R Cat dt bt bt bt bt bat MzzzzzzzzT MaT MT McT MdT MeT M0T M1T M2T M3T M4T
41Dogbe hit foffasm zza " Dog Cat 1, 4= Dog ['zxcvbnm,./R Dog MT
42Dogbe hot foffasm czzb " MazazazTDogogogogog Cat 1, 4= Dog [;'zxcvbnm,./R Dogtp
43Dogbe foffasm dooooc MazT" Dog Cat 1, 4= Dog [Sqwertyuiopasdfghjkl;'zxcvbnm,./R Dog Cat
44Dogbe foffasm ezzzzzzzzzzzzzzt "tp Dog Cat 12, ktp 4= Dog [jkl;'zxcvbnm,./R Dogtp
45Dogbe foffasm zze " Dog CatMjT , = Dog [;'zxcvbzzznm,./R Dog MazazT cat
46zzcztpDogbe fofasm zazazz4z Doggg Cat 6, azzzzz= Dog [;'zxcvbonm,.R Dog TUT Dog
47Natatatats Nats T M0T ed bazbzczdzt et
48Dfg dc fog Nt ezt
49MazazazazazazazT
50</datafile>
51
52<datafile id="StartEndAlt">
53The ever-growing social networks and social media provide invaluable
54sources of information for modeling the behavior of users. High-quality
55user models enable superior services and functions for end users. In this
56talk, I will present several examples of user modeling based on social
57networks and social media. I will first describe our research in modeling
58users' information preferences on Microblogs using a novel user message
59model. I will then discuss our work on extracting users' daily activities,
60such as dining and shopping, that inherently reflect their habits, intents and preferences.
61I explain our novel transfer learning solution via a collaborative boosting
62framework comprising a text-to-activity classifier for socially connected users.
63I will also describe our research on user modeling in multiple, overlapping
64social networks in a 'composite social network' setting. I will show the benefits of
65modeling the dynamics of composite networks, where the evolution processes
66of different networks are jointly considered. Finally, I will explain our
67research on finding social spammers in large social networks.
68</datafile>
69
70<datafile id="special_characters">
71The ] character may appear as the first character inside character class
72expressions such as []>)].
73In this case, the ] character does not terminate the character class, but
74stands for itself.
75Similarly, the - character may appear as the first or last character
76in a character class expression, such as [-] or []-].  Occurring as the
77first or last character in a class means that it is a member of the
78class, instead of being interpreted as a range metacharacter.
79For both ] and -, occurrence as the first character could mean after
80an opening [^ mark for negated character class.   That is [^]] is the
81class that matches everything but ], while [^-] is the class that matches
82anything but -.
83----------
84The above line does not match [^-].
85----------
86]]]]]]]]]]
87^^^^^^^^^^
88</datafile>
89
90<datafile id="ips"> 
91201.250.180.213
92236.4.20.176
93137.96.194.126
94245.16.96.112
95245.19.58.43
96131.176.131.248
97248.160.22.214
98156.179.88.103
99174.13.62.156
100256.122.123.5
10116.81.78.152
102177.17.24.167
10332.120.25.23
104138.82.66.15
1054.196.8.251
106101.30.211.3
107209.44.105.129
10856.166.31.72
109247.108.224.170
110124.248.83.156
111113.107.178.250
112189.243.10.192
113184.18.189.31
11448.145.33.2
115188.137.131.244
11649.161.61.42
11714.31.211.138
11824.39.39.136
119146.217.131.80
120205.141.18.135
121159.207.166.206
12296.211.62.20
12323.148.44.140
124109.159.129.161
125183.230.172.129
12648.178.63.192
127224.41.190.207
128144.114.56.31
129151.205.132.247
130161.194.12.184
13187.55.69.195
132214.198.102.143
133173.19.17.220
134197.80.158.167
135121.94.119.11
136208.174.42.104
137124.173.96.31
138112.107.215.199
139162.30.140.121
140227.241.9.145
1416.26.111.203
142106.14.115.226
143107.233.237.60
144153.24.163.23
145197.4.54.55
146111.14.253.18
14743.138.139.15
148125.148.160.131
149173.16.80.24
15030.194.250.136
151173.233.196.71
152</datafile>
153
154<datafile id="emails">
155danielsmithinvestment01@yahoo.com
156vivian.johnp24@gmail.com
157drjohnsonadamscompany@mail.com
158fb43@kurtz.onmicrosoft.com
159delphinehakizimana11@zipmail.com.br
160mrs.swp@outlook.com
161engr.saidsalem@workmail@co.za
162suleadams342003@gmail.com
163info.soopercredit@qq.com
164aliceisdale@yahoo.com
165elizabethjohnson134@hotmail.com
166anikaebertus@yahoo.se
167bayford_A@qq.com
168hijabfarid@hotmail.com
169zaringwarkipkalya@aol.fr
170monahmeddd2014@gmail.com
171hijab.farid@hotmail.cam
172dennis.melcher01@gmail.com
173publicitycbn@gmail.com
174michaelkruegerloancompany@gmail.com
175ben525387@gmail.com
176dgill_pwc@mynet.com
177dgill_pwc1@terra.com
178tuthpala12@gmail.com
179johanthony1956@e-mail.ua
180christopher.white01@live.co.uk
181anitaloanfirm@live.com
182aliadamssolicitors@gmail.com
183jonathanevans000@yahoo.com
184jwatson494@yahoo.com
185ec21buyer@gmail.com
186sussanbien2012@gmail.com
187info@pavochenkofinance.tk
188honbarrijzdende@gmail.com
189ernestebi699@e-mail.ua
190siwei4489@yahoo.com.hk
191peterkoffi.info@gmail.com
192zenithbankplc106@yahoo.com
193fidelitybankplc505@aim.com
194kymcrox03@gmail.com
195esqharsmith2015@gmail.com
196facebooklottdepartment936@gmail.com
197lt_industries@outlook.com
198cpfi.ltd@live.nope
199changying33@yahoo.com
200abdoul0000hamid@gmail.com
201foreign_exchange@live.co.uk
202hdcliveuk@live.com
203fatimahhassan1@fengv.com
204mikejosephloanfirm202@gmail.com
205skyebanktg@rediffmail.com
206mrsbellafirm001@gmail.com
207financtreasury.uk@email.com
208admin@senagua.gob.ec
209m2424m@live.com
210stevewilliam197@gmail.com
211mrmathew.martins@yahoo.com
212benjaminwilliam917@gmail.com
213abe.shelton1@lenta.ru
214owengah@live.com
215dlserv01@aol.com
216ee.apala@gmail.com
217bbcpaydpt@live.com
218undpfn20114@gmail.com
219janievitek@gmail.com
220creditservice@careceo.com
221cying011@yahoo.com
222christophe_gbeffa@hotmail.fr
223maracasinter@yahoo.com
224iquad94@yahoo.com
225emil.jacobs@mail.com
226emil.jacob@mail.ru
227mgremittance.info@yahoo.co.uk
228raymondmorgan02@hotmail.com
229mrs_sabahibrahim@ymail.com
230drthomascole7@gmail.com
231barrp.agbo@outlook.fr
232mrsmorganhenlenloanfirm@gmail.com
233barr.njdmdcggroup@yahoo.com
234hknbddhb@gmail.com
235michelfoucault@outlook.fr
236goldsupply@rediffmail.com
237dvdmumbai2000@gmail.com
238mikefinance02@gmail.com
239moonstoneking@gmail.com
240peterstone586@gmail.com
241denis_andre_phillipe@aol.com
242roberto.greco@aol.fr
243mark_grant112@hotmail.com
244nokiaxprizefoundationclaims@coolsite.net
245claims14_88@libero.it
246hon.leo.price@gmail.com
247info_unicef@consultant.com
248u_deliverycompany@yahoo.com
249eldhabiblamah152@gmail.com
250governorsanusi.lamido@yahoo.com.ph
251emyjean18@zipmail.com.br
252winningemail@luckymail.com
253barristervictor_odo@yahoo.com.ph
254nokia.global_promo@consultant.com
255headoffice_cv20448bd@libero.it
256ab.issah@yahoo.com
257ab_issah@yahoo.com.tw
258rifaatassad552@yahoo.com.hk
259barrsandilekhumalo@gmail.com
260gkiir@qq.nope
261ibrahimahmed3@aol.fr
262efccin@e-mail.ua
263dheerajrelan@gmail.com
264al-fardan@al-fardan-export.com
265mellissa000@hotmail.com
266verakones01@hotmail.com
267kivaloanfinance999@gmail.com
268atm.paydept00@outlook.com
269claudiokristiansen@yahoo.co.za
270info.kmf@gmx.com
271mambojames689@yahoo.co.uk
272a.salam2014bf@terra.com
273vanessappillip99@yahoo.com
274vanessaphillip@live.com
275alshat@emirates.net.ae
276</datafile>
277
278<datafile id="floats">
2799.7
28016.07
28127.675
28286.162
283189.36792
284859.073357
2851377.9901658
2861514.73870948
2872096.400730002
2882551.2050637982
2894615.26633110512
2908438.114838435104
29132036.61593959936
29236346.00047312989
293144826.22607192554
294+3.1eE5
295+4.992
296+2.425E+10
2979.5808eE10
2989.5808e10
299+0.416968e+0
300-0.3162108-0
301+0.03069882+0
302+0.132378721eE+-0
3030.43416726670
304+-0.43416726669e+0
305+-0.01976811464eE0
306-0.0197681146402e+-0
3070.02241943884633+0
308+-0.004803458640268eE-0
309+0.0008164744337844E+-0
3100.00266694045551024E+0
311+-0.0112132498185713980
3120.0003485919632198585e+-0
313-0.002599516682231249E+0
3140.02315181236174286E+0
315+0.0116575240311669+0
316+-0.06536499789006515eE+-0
317+20.914506804599366eE+-21
318+-20.062034167562416eE+20
31935.90964837611389E-1
320+-2.5508584172940916E-0
3210.6532888027107796eE0
322+0.02530509823216493E0
323-0.016818871414735502eE+-0
3240.01041535031385609E+0
325-0.017042043493346013eE0
326-0.015882934560610525eE0
327+-0.016271711916486607E+0
328-1.1521320712689072e-1
3290.5796638373356339+2
330-6.78321804536429e+-8
331+-18.6367662944200621
332+20.63224902663965eE21
333+-16.78193317331960417
33410.049610186973338-21
33564.51055985925869eE+-65
336+71.7394478831031eE+115
337+114.85412411903206eE-53
338+150.50431315365464e116
339-388.86846448777743eE+-334
340+-75.50343657758405E-76
341-75.50343657758405eE-151
342-216.9511816984773E176
343-175.798740561957eE-178
344+13.25998057047805113
345+3.745360060000819eE+27
346-27.329937066467846E23
34713.34390770072532E+35
348+34.68092648862783eE+-36
349+-35.6389454910375E-160
350+493.90278138088945eE+-1037
3511037.4462608675137+356
352-356.17279137431007E+983
353</datafile>
354
355<datafile id = "CRLF">line with CRLF &#13;&#10;two lines with LFCR &#10;&#13;final line
356</datafile>
357 <grepcase regexp="^$" datafile="CRLF" grepcount="1"/>
358 <grepcase regexp="^.*$" datafile="CRLF" grepcount="4"/>
359
360 <datafile id = "LU_test">
361The following line has LATIN CAPITAL LETTER G WITH MACRON in single quotes.
362'&#x1E20;'
363</datafile>
364
365<grepcase regexp="ab" datafile="StartEndAlt" grepcount="4"/>
366<grepcase regexp="a*b" datafile="StartEndAlt" grepcount="10"/>
367<grepcase regexp="ab*" datafile="StartEndAlt" grepcount="15"/>
368<grepcase regexp="^user|^I|our$" datafile="StartEndAlt" grepcount="5"/>
369
370<grepcase regexp="fe|si" datafile="simple1" grepcount="3"/>
371<grepcase regexp="in" datafile="simple1" grepcount="2"/>
372<grepcase regexp="[A-Z]" datafile="simple1" grepcount="1"/>
373<grepcase regexp="fodder|simple" datafile="simple1" grepcount="2"/>
374
375<grepcase regexp="[cde]{3}" datafile="bounded_charclass" grepcount="3"/>
376<grepcase regexp="[f-h]{5}" datafile="bounded_charclass" grepcount="3"/>
377<grepcase regexp="[a-z]{5}" datafile="bounded_charclass" grepcount="22"/>
378<grepcase regexp="[a-z]{5,15}" datafile="bounded_charclass" grepcount="22"/>
379<grepcase regexp="=[a-z]{7,}" datafile="bounded_charclass" grepcount="20"/>
380<grepcase regexp="=[a-z]{5,15};" datafile="bounded_charclass" grepcount="11"/>
381<grepcase regexp="[wxy]{2}{3}{2}" datafile="bounded_charclass" grepcount="3"/>
382<grepcase regexp="=([a-z][c-z])*;" datafile="bounded_charclass" grepcount="12"/>
383
384<grepcase regexp="^D[zabcdefoy]g" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="7"/>
385<grepcase regexp="do*c|ez*t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="4"/>
386<grepcase regexp="M(az)*T" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="6"/>         
387<grepcase regexp="ez+t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2" />
388<grepcase regexp="b([a-d]z)*t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
389<grepcase regexp="[^D]og" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
390<grepcase regexp="Na?t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="2"/>
391<grepcase regexp="h.t" datafile="RangeAltSeqMatchStarKplusWhileNotOptAny" grepcount="3" />
392
393<grepcase regexp="[]]" datafile="special_characters" grepcount="9"/>
394<grepcase regexp="[-]" datafile="special_characters" grepcount="8"/>
395<grepcase regexp="[]^-]" datafile="special_characters" grepcount="14"/>
396<grepcase regexp="[\-\]\^]" datafile="special_characters" grepcount="14"/>
397<grepcase regexp="[^]]" datafile="special_characters" grepcount="16"/>
398<grepcase regexp="[^-]" datafile="special_characters" grepcount="15"/>
399<grepcase regexp="[^^]" datafile="special_characters" grepcount="16"/>
400<grepcase regexp="[^]-]" datafile="special_characters" grepcount="14"/>
401<grepcase regexp="[.]" datafile="special_characters" grepcount="7"/>
402
403<grepcase regexp="^((([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])[.]){3})([2][5][0-5]|([2][0-4]|[1][0-9]|[0-9])?[0-9])$" datafile="ips" grepcount="60"/>
404<grepcase regexp="^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.([a-zA-Z]{2}|com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum)$" datafile="emails" grepcount="116"/>
405<grepcase regexp="^[-+]?([1-9]0?)+\.?((0*[1-9])+|0)([eE][-+]?([0-9]+)+)?$" datafile="floats" grepcount="26"/>
406
407<!-- . should match a unique character, even if it is 3 bytes. -->
408<grepcase regexp="'.'" datafile="LU_test" grepcount="1"/>
409<grepcase regexp="'...'" datafile="LU_test" grepcount="0"/>
410<grepcase regexp="\u{1e20}" datafile="LU_test" grepcount="1"/>
411<grepcase regexp="\u{1e21}" datafile="LU_test" grepcount="0"/>
412<grepcase regexp="\p{Lu}" datafile="LU_test" grepcount="2"/>
413<grepcase regexp="'\p{Lu}'" datafile="LU_test" grepcount="1"/>
414<grepcase regexp="\p{Ll}" datafile="LU_test" grepcount="1"/>
415</greptest>
Note: See TracBrowser for help on using the repository browser.