[Coco] INSTR question

Mathieu Bouchard matju at artengine.ca
Sun Feb 19 13:23:03 EST 2017


Ruby's .index and =~ return nil, which is worth false in conditionals, but is 
distinct from it. nil, false and true are not of a number type. Also, Ruby's 
zero counts as a true value, so that a match at position zero counts as true.

Perl's .index returns -1. Perl has undef, which is similar to Ruby's nil and 
false, but is not used here. You're supposed to put <0 in the condition as in C 
language. Perl's =~ returns undef when not matching, and then you're not 
supposed to use the @- array.

Python's index throws a ValueError. A thrown value is not a return value, and to 
avoid aborting the program with "ValueError: substring not found", you have to 
do the equivalent of a temporary ON ERROR for just that error type or any 
broader category of errors:

   try:
       print "ABC".index("Z")
   except ValueError:
       print "not found !"

Python's re.search returns None, which is like Ruby's nil. if you call .start() 
on it, you'll abort with "AttributeError: 'NoneType' object has no attribute 
'start'". Normally you'd use an if-statement for avoiding that, not a 
try-statement.

Java's indexOf returns -1. Note that indexOf is constrained to returning 
integers, therefore it can't return null (equivalent of Ruby's nil). 
Javascript's indexOf returns -1 too, even though the language doesn't restrict 
types.

C's strstr is constrained to returning a pointer. You're supposed to check 
whether the pointer is NULL (which is zero), before subtracting s to find the 
index.

C++ STL's find is constrained to returning an unsigned integer. It returns 
std::string::npos, which is equal to the largest unsigned integer, which is 
really just a -1 in disguise (the return type of find doesn't allow negatives).

Unix shell's grep is a program that produces a text stream (pseudo-file) that 
lists lines that match. When there is no match, the output is empty (the stream 
will start with an EOF). You can also check the exit code of grep, which is 1 
when not found, 0 when found (0 is the true value, as in "no error"). The pipe 
symbol is as in OS9.

Tcl's "string first" returns -1. Tcl's "regexp" returns 0, which is the false 
value, and in my example, if the string is not found, the x variable is not even 
set. Therefore if you try to do the next command (or whatever with a $x in it), 
it will abort, saying x does not exist !

PHP's strpos returns FALSE, which is equal to zero (==0) but not identical to 
zero (===0). preg_match returns 0 and sets the $m variable to an empty array 
(whereas if it matched it would return 1). preg_match could also return FALSE 
but only if the pattern string has a syntax error in it, so, usually, you don't 
have to distinguish it from 0.


Le 2017-02-19 à 10:51:00, Allen Huffman a écrit :

> In the examples that return 0 if matching the in the first position or "", what do they return if no match is found?
>
>> On Feb 19, 2017, at 8:36 AM, Mathieu Bouchard <matju at artengine.ca> wrote:
>> 
>> 
>> I searched for real and it isn't exactly that universal. Let's start with some that are consistent :
>> 
>> Ruby (both plain search & pattern matching) :
>> "ABC".index"A"
>> 0
>> "ABC".index""
>> 0
>> /A/ =~ "ABC"
>> 0
>> // =~ "ABC"
>> 0
>> 
>> Perl :
>> print index("ABC","A")."\n"
>> 0
>> print index("ABC","")."\n"
>> 0
>> "ABC" =~ /A/; print "@-\n"
>> 0
>> "ABC" =~ //; print "@-\n"
>> 0
>> 
>> Python :
>> "ABC".index("A")
>> 0
>> "ABC".index("")
>> 0
>> re.search("A","ABC").start()
>> 0
>> re.search("","ABC").start()
>> 0
>> 
>> Java :
>> System.out.println("ABC".indexOf("A"));
>> 0
>> System.out.println("ABC".indexOf(""));
>> 0
>> 
>> C (where this behaviour probably originated from) :
>> const char *s="abc"; printf("%zd %zd\n",strstr(s,"a")-s,strstr(s,"")-s);
>> 0 0
>> 
>> C++ STL :
>> string s="abc"; printf("%zd %zd\n",s.find("a"),s.find(""));
>> 0 0
>> 
>> Unix shells pattern matching :
>> echo ABC | grep -b A
>> 0:ABC
>> echo ABC | grep -b ""
>> 0:ABC
>> 
>> (the list could go on)
>> 
>> However, Tcl is not consistent (doesn't find empty string) :
>> string first A ABC
>> 0
>> string first "" ABC
>> -1
>> 
>> And also not consistent in PHP and issues a warning (wow !) :
>> var_export(strpos("abc","a"));
>> 0
>> var_export(strpos("abc",""));
>> PHP Warning:  strpos(): Empty needle in php shell code on line 1
>> false
>> 
>> But there's an alternate consistent way in Tcl, using pattern matching :
>> regexp -indices a abc x; lindex $x 0
>> 0
>> regexp -indices "" abc x; lindex $x 0
>> 0
>> 
>> And in PHP too :
>> preg_match("/a/","abc",$m,PREG_OFFSET_CAPTURE); var_export($m[0][1]);
>> 0
>> preg_match("//","abc",$m,PREG_OFFSET_CAPTURE); var_export($m[0][1]);
>> 0
>> 
>> 
>>> Le 2017-02-10 à 15:05:00, Paulo Garcia a écrit :
>>> 
>>> Interesting discussion. Indeed the same behaviour is found in Python and
>>> Javascript:
>>> 
>>> NodeJS:
>>> 
>>>> a='ABC'
>>> 'ABC'
>>>> a.indexOf('A')
>>> 0
>>>> a.indexOf('B')
>>> 1
>>>> a.indexOf('C')
>>> 2
>>>> a.indexOf('')
>>> 0
>>>> 
>>> 
>>> Python:
>>> 
>>>>>> a='ABC'
>>>>>> a.index('B')
>>> 1
>>>>>> a.index('A')
>>> 0
>>>>>> a.index('')
>>> 0
>>>>>> 
>>> 
>>> 
>>> Paulo
>>> 
>>> On Fri, Feb 10, 2017 at 2:29 PM, Mathieu Bouchard <matju at artengine.ca>
>>> wrote:
>>> 
>>>> 
>>>> Nope, it's like that in probably every language that has such a search
>>>> function : an empty string is found at EVERY position in the string,
>>>> therefore the first match it finds is wherever the search begins. It's the
>>>> normal way of doing it, because it logically fits the way N characters are
>>>> searched in a string, for N=0, and the behaviour you wish would mean adding
>>>> a special case for N=0 where programmers prefer to define functions so that
>>>> they have the least possible number of cases.
>>>> 
>>>> (However, in other languages, 0 is the first position in the string,
>>>> whereas "no match" is represented by another value (such as -1 or nil or
>>>> error))
>>>> 
>>>> 
>>>> Le 2017-02-09 à 15:12:00, Allen Huffman a écrit :
>>>> 
>>>> ...but I noticed today it finds the empty string: ""
>>>>> 
>>>>> PRINT INSTR("ABCDE", "")
>>>>> 1
>>>>> 
>>>>> That seems like a bug.
>>>>> A$=""
>>>>> PRINT INSTR("ABCD", A$)
>>>>> 1
>>>>> 
>>>> 
>>>> ______________________________________________________________________
>>>> | Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC
>>>> 
>>>> 
>>>> --
>>>> Coco mailing list
>>>> Coco at maltedmedia.com
>>>> https://pairlist5.pair.net/mailman/listinfo/coco
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> --------------------------------------------
>>> Paulo
>>> 
>>> -- 
>>> Coco mailing list
>>> Coco at maltedmedia.com
>>> https://pairlist5.pair.net/mailman/listinfo/coco
>> 
>> ______________________________________________________________________
>> | Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC
>> 
>> -- 
>> Coco mailing list
>> Coco at maltedmedia.com
>> https://pairlist5.pair.net/mailman/listinfo/coco
>
>
> -- 
> Coco mailing list
> Coco at maltedmedia.com
> https://pairlist5.pair.net/mailman/listinfo/coco

  ______________________________________________________________________
| Mathieu BOUCHARD --- tél: 514.623.3801, 514.383.3801 --- Montréal, QC


More information about the Coco mailing list