Pattern to find string

Category: c# general

Question

Abhijeet Khopade on Wed, 22 Aug 2018 14:01:11


Hi,

In my C# code I am using Regular expression ("(0)?(?<!\\.)[0-9]{7}[a-wA-W]([wW])?") to detect pattern with 7 digits and char in string. 

This works fine for 1234567DA or 9876543AS but it also picked up pattern from string "more things for alphanumeric is 9.8760179535020143E-3 and then some amount is 1.589499402736099"

How can i tweak my Regex so that it will not pick up from something like "9.8760179535020143E-3"


Abhijeet Khopade

Replies

CoolDadTx on Wed, 22 Aug 2018 15:38:01


Can you explain the format you are expecting because your RE doesn't line up with your examples?

In your first example of 1234567DA it only sees 1234567D.

In your second example it sees 9876543A.

In your last example it sees 5020143E.

Your pattern says the string may start with a 0. Then it is followed by dot (??) and then 7 digits. Then a single letter between A and W. Then it may have an optional extra W on the end. 

Viorel_ on Wed, 22 Aug 2018 18:51:05


If you want to exclude the sequences that are preceded by any digit, then try adding (?<![0-9])before your pattern.

Jianguo Wang on Thu, 23 Aug 2018 03:16:14


Hi Abhijeet Khopade,

Thank you for posting here.

>>How can i tweak my Regex so that it will not pick up from something like "9.8760179535020143E-3"

I have provided a demo for your question.

Please try the code as follows:

        static void Main(string[] args)
        {
            string input = "1234567DA9876543AS9.8760179535020143E-31.589499402736099 5020143E-3.sdwew";

            string pattern = @"(0)?(?<!\\.)[0-9]{7}[a-wA-W]+([wW])?";
            foreach (Match match in Regex.Matches(input, pattern))
                Console.WriteLine(match.Value);
        }

Regards,

JianGuo

Abhijeet Khopade on Thu, 23 Aug 2018 23:12:48


Hi,

Thanks a lot for your response.

Ok I will try to make it simple, I want RE that will just look for 1234567DA  or 9876543AS and it should avoid something like 9.8760179535020143E-3 and not show me 5020143E separately.

Abhijeet Khopade on Thu, 23 Aug 2018 23:13:14


Hi,

Tried that, but unfortunately didnt work for me.

Abhijeet Khopade on Thu, 23 Aug 2018 23:15:33


Hi,

Thats the whole point. I really dont want to pick up value from something like this 1234567DA9876543AS9.8760179535020143E-31.589499402736099 5020143E-3. it should just look for look for 1234567DA  or 9876543AS 

CoolDadTx on Fri, 24 Aug 2018 01:29:51


If you want an exact match then ^\d{7}[a-zA-Z]{2}$ should work. It matches 7 digits followed by 2 characters. With the delimiters on each end it wouldn't match anything in the middle. If you want to capture the numbers and digits then you can use a grouping expression as well.

If this doesn't work for you then please provide the input you gave it, the output you got and what you expected.

Abhijeet Khopade on Fri, 24 Aug 2018 11:41:53


Hi CoolDadTx,

Thank you very much for your response.

I tried the RE you provided but returns 0 results. I tried this huge chunk of data but it is unable to pick anything like 1234567AA or anything sort of.

I am trying with different options but unable to fulfil the requirement.

CoolDadTx on Fri, 24 Aug 2018 13:39:40


Hang on a second, you seem to have mixed requirements. To me you said you wanted to exactly match the string. But now you're saying your trying to find this data in a much larger string? You can't have it both ways. Either you're looking for this exact string (e.g A1234567BC fails) or you're looking for it (a subset) in a larger string (e.g. the above string works). Which are you trying to do?

Note that you responded to another poster that "9.8760179535020143E-3" shouldn't match but if you're looking for a subset then it would (with a correction). RE is pattern matching so "5020143E" doesn't match in the above string but "5020143EA" would. So in your above string the RE would correctly pick out the subset of the string that matches your pattern. 

Abhijeet Khopade on Fri, 24 Aug 2018 15:45:11


Hi,

Sorry for the confusion.

What I really want if if I am having text file with below content 

"The attached job description is related to this job posting.

The employee's adddress is 123 Park Lane, and his id is 1234567A and other ID is 9876543A

Please send your date of birth in order to complete the form.

Other text is 9.8760179535020143E-3 and then Candice has this ID 5397566P"

So if I use RegEx then my it should only PICK below values -

-1234567A
-9876543A
-5397566P

and it should Ignore -

-5020143E (It shouldn't be looking at all into 9.8760179535020143E-3 or anything like this)

Sorry for the confusion

CoolDadTx on Fri, 24 Aug 2018 16:05:48


I really think you're going to want to subdivide this problem. Can it all be done with RE? Possibly but would it be manageable? Probably not.

Given the text file I assume you're enumerating line by line using ReadAllLines or equivalent. For each line you're looking for the pattern in a larger string so adding some whitespace around the pattern would clarify you want "is 1234567AB " instead of "Hello1234567AB". Given your example it isn't clear whether you can rely on spacing on the end but at the beginning it seems reasonable so maybe the RE could be updated to this: \s+\d{7}[a-zA-Z]{1,2}

This wouldn't work if the line starts with the value but that doesn't appear to be an issue here. Alternatively if you can rely on additional text as well (e.g. ID 1234567AB) then you can add that to the pattern as well. Finally, I think you said you required 2 letters on the end but I keep seeing you provide only 1. So to handle that case I adjusted the pattern to allow 1 or 2 letters on the end.

Note that there are many extensions available for VS to help you with writing these REs. I personally use Regex Editor Lite. I like staying inside VS but there are online versions as well. Once you install it then you get a Regex editor you can write patterns in and specify inputs to verify before you actually write your code. You can also play around with the Regex-specific options to see what works. It's how I'm testing your scenario.

Here's my inputs.

 1234567AB
 1234567A
 AB1234567AB
9.1234567AB

Abhijeet Khopade on Fri, 24 Aug 2018 19:34:49


Hi CoolDadTx,

It works perfectly for me.

Thanks a lot for your suggestion and insight about how to work with RegEx.