My Objection to Array#sum

nostrademons · on April 9, 2009

Python does this - sum/any/all are standalone functions, and join() is a member of the string class instead of the array class. Python also gets a lot of flack for it, as it seems like every month you can see someone complaining about why you use `"\n".join(lines)` to join an array instead of `lines.join("\n")`. People can't seem to wrap their head around it.

Personally, I think the big mistake is to think objects should represent things in the real world instead of adapting to the needs of the program. For example, it's very common to need to split a class into FooLike (an interface), FooImpl, and FooRenderer to avoid coupling specific display logic to business objects you may want to reuse in other contexts. These have no analogs in the real world. They're there largely to make sense out of the different things your program needs to do with a Foo.

DannoHung · on April 10, 2009

This is one of the reasons that I think newbs fuck up OO design so often; the way it's sold and the way that it can be used effectively are almost entirely divergent.

lacker · on April 9, 2009

"\n".join(lines) is confusing because you expect the complicated thing to be the object, and the simple thing to be the argument. The confusion is very basic.

It would be cool but probably perlishly unadvisable if join,sum,any,all were just implemented as

  "\n".join(lines) -> lines.join("\n")
  sum(alist) -> alist.join(+)
  any(alist) -> alist.join(or)
  all(alist) -> alist.join(and)

nostrademons · on April 10, 2009

That's what Haskell does. Your universal joiner is called "fold", sometimes pronounced "reduce" (Scheme/Python) or "inject" (Smalltalk/Ruby). From the Haskell prelude:

    sum = foldl (+) 0
    product = foldl (*) 1
    and = foldl (&&) True
    or  = foldl (||) False
    any p = or . map p
    all p = and . map p
    concat = foldl (++) []
    unlines = concat . map (++ "\n")

Google calls it "reduce" as well - this is what MapReduce is based upon.

(BTW, bad search query: [haskell unlines] gives me the Haskell prelude definition as the last result on the first page, even though that's the authoritative source. It gives me the Zvon and Informatik mirrors as #1 and #2, and then lots of useless discussion. Couldn't there be a way to boost "authoritative" sources up the rankings?)

lacker · on April 10, 2009

Yeah, in practice though I would rarely use reduce in python to do a sum/any/all, but now that they're in the language I like them. Before I would just spell it all out. It was just slightly too wordy to make it worth it, kind of like python lambdas in general.

jherdman · on April 9, 2009

Count me as one of those people that don't get it. If I'm joining the elements of an array together, I'm operating on the array. Therefore I _expect_ the join() method to be on the array.

To me, Python can't seem to make up its mind with these weird (IMHO) stand-alone methods like join(), sum(), any(), etc. That's the kind of thing I'd expect in a functional language, not in an OO language.

jerf · on April 9, 2009

The thing is, they aren't methods. They really aren't. They don't operate on an object using the object's local data. They are generic functions that use certain defined interfaces that many objects define.

"list.join(str)" and "str.join(list)" are not the same. The list.join syntax only works for lists, but str.join takes any sequence, which is anything that implements the sequence protocol, including lists, dicts, sets, trees, heaps, iterators in general, and a limitless variety of user-defined sequences. Once you understand that it's not a matter of picking two spellings of the same functionality, but actually radically different functionality between the two spellings. (Putting this on string rather than it being a free-floating function is perhaps dubious, but at least the operation does have a sort of irreducible stringy-ness to it in a way that it does not have a listy-ness to it.)

If you make the "join" a method of list, then you have grotesquely cut its functionality. Now you are requiring people to implement their own join on every other object that implements the sequence protocol, which is just silly. (Of course if they have other needs they may still implement something else, but why not give them the option of the default?)

This is generic-programming through duck-types, not OO. Similarly for all the other examples you cite and quite a few more; putting them on "list" is not "the sensible thing to do", it would be a grave error.

    Python 2.5.2 (r252:60911, May  7 2008, 15:19:09)
    ...
    >>> ", ".join(str(x) for x in xrange(10))
    '0, 1, 2, 3, 4, 5, 6, 7, 8, 9'

That's not a list in there.

zupatol · on April 9, 2009

Well join should be on sequence instead of list then.

To me it seems unnatural to see join on string because string feels more 'basic' than a sequence, so I would expect the sequence to know about strings rather than the opposite. But the argument against join on string is not as strong as the argument against sum on array.

scott_s · on April 9, 2009

That's just the thing: there is no sequence class in Python. There's only a sequence concept. A sequence is any object that implements a particular interface. Strings aren't more basic than the concept of a sequence in Python; strings are a type of sequence.

See http://docs.python.org/library/stdtypes.html#iterator-types and http://docs.python.org/library/stdtypes.html#http://docs.pyt...

tetha · on April 10, 2009

I think, the join-akwardness in python mostly is a single-dispatch issue. Both variants, glue.join(sequence) and sequence.get_joined_by(glue) make some sort of sense, however, neither strikes all readers as the one obvious thing that obviously is right. (Compare similar single dispatch issues like: Will the employee have a add_to_department or will the department have a get_new_employee?) Thus, I think, the cleanest solution would be a generic method join(glue, string) so you just call join('\n', lines). This removes the question if the glue is more responsible in the join process or if the sequence is more important in the join process, because both are equally important.

jerf · on April 9, 2009

I anticipated this objection if you go back and re-read my post, but I'll blow out the point a bit more here: In Java terms, "sequence" is an interface.

You can't put join there. Because then you have to put add, any, none, all, reduce, filter, and so on and so forth for at least another ten or twenty base methods (ignoring everything else that applies just to the sequence protocol written by users), which every single implementation would then have to provide. No matter how abstractly beautiful it may be that this implements object-orientation dogma, in practice it's an idea too stupid to even begin to consider.

litewulf · on April 10, 2009

Thankfully there is this concept of mixins (and, gasp, multiple inheritance), to solve some of these issues.

calambrac · on April 9, 2009

This is such a non-issue. Really. The parent posted the reasoning behind the decision, that reasoning is sound. Your disagreement is purely aesthetic, there's simply no practical reason why the current way is wrong. Please just accept that this is how it is, and stop complaining about it. Please?

nostrademons · on April 9, 2009

That was my point when I started the thread. The Python decision is objectively justified for technical reasons. Yet a large number of people viscerally feel it's wrong. I blame human irrationality - but then, programming languages are meant for humans, so shouldn't they accommodate our irrationality? Otherwise, you get something like Haskell. ;-)

BTW, there are other cases where Python has gone the other way. Multiline lambdas, for instance. The case against multiline lambdas is essentially that they look ugly - they let you embed an indentation-sensitive block in what might be a parenthesized expression, which means you need some dangling delimiters. But they're certainly useful, as Scheme/Lisp/Ocaml/etc. have shown.

It's similar to the compare-constants-from-the-left idiom in languages where assignment is a valid expression (notably C). If you always write "if (NULL == foo)" instead of "if (foo == NULL)", you eliminate a whole class of bugs. But I've seen very few programmers do this, because it "feels" wrong to a lot of people.

calambrac · on April 9, 2009

I completely agreed with your original point. The fact that we went from your comment to people actually bitching about join made me really sad.

I shouldn't have said that the join argument was "purely aesthetic", I should have said that it was "purely aesthetic and trivial", because you're right, aesthetics are important, and (imo) are reason enough to disallow multiline lambdas. How do you cross the line from trivial to not? I don't know exactly, but I do know that swapping the position of the involved nouns and taking up the same number of characters for a good technical reason doesn't.

lacker · on April 9, 2009

"if (NULL == foo)" and "'\n'.join(lines)" both feel wrong because you are putting the most complicated object last. The most complicated thing should go first so you can sooner figure out what the heck this line of code is talking about.

I would support making the C compiler just disallow "if (x = y)".

nostrademons · on April 9, 2009

OTOH, in languages that support first-class functions, it's pretty common to put the functional argument last because then it can wrap to another line without leaving dangling parameters:

    $.each(my_array, function(val) {
      // Do something
      // Do something else
    });

That's putting the most complicated object last, and feels much more natural than:

    $.each(function(val) {
      // Do something
      // Do something else
    }, my_array);

I suspect it's more that English has trained us to read "noun verb object" sentences. In an if statement, `foo` feels like the subject, and then "== NULL" is the predicate. In an array map, the array is the subject, and the function is the predicate. In the Python join example, the array feels like the subject, "join" is the verb, and "," is the object, which is what makes it seem so awkward. Ruby's array.sum feels more natural because the array is the subject and sum is an intransitive verb.

philh · on April 9, 2009

Python is intended to be aesthetically pleasing in some sense, so I don't think aesthetic objections to some of its features are unwarranted.

calambrac · on April 9, 2009

This isn't one of the 'some'.

http://en.wikipedia.org/wiki/Color_of_the_bikeshed

scott_s · on April 9, 2009

I don't think this is bike-shedding. That applies when people argue over two functionally equivalent options, often to the detriment of more important issues.

In this case, their objection is that it violates their intuition, but the placement of join is a natural result of how Python's types are defined. Defining join on all sequences would have real (annoying) implications.

calambrac · on April 9, 2009

That applies when people argue over two functionally equivalent options, often to the detriment of more important issues.

How does that not fit this situation? Because there's actually a good technical reason to do it the way it's currently done? Wouldn't that imply that there should be even less discussion of it?

scott_s · on April 9, 2009

No, because the discussion should be explaining that good technical reason. Which is what happened.

Bike-shedding is when the arguments have relatively the same merit, and the choice makes little difference in the end. I don't think either applies.

An example of bike-shedding in this context would be renaming join to fuse.

calambrac · on April 9, 2009

Bike-shedding is when someone feels the need to interject their opinion simply because the topic is something they feel they understand, regardless of whether that interjection actually contributes anything.

Bitching that '\n'.join(alist) should be alist.join('\n') is a favorite pastime of people who want to demonstrate that they've heard of OO programming. It's a perfect example of bike-shedding.

scott_s · on April 9, 2009

When I use the term, I assume both sides have marginal arguments - so the resulting discussion is a waste of time. I use it this way to remind a group that we're not discussing something important. But if one of the sides has merit, than it's important for the other side to understand that.

calambrac · on April 9, 2009

You can't just make up your own definition of a well-known term and then call people wrong when they use it correctly:

http://www.freebsd.org/cgi/getmsg.cgi?fetch=506636+517178+/u...

scott_s · on April 9, 2009

I think my interpretation is the more common one:

http://en.wikipedia.org/wiki/Color_of_the_bikeshed

http://catb.org/jargon/html/B/bikeshedding.html

http://www.urbandictionary.com/define.php?term=bikeshedding

All of these definitions require the issue to be of marginal benefit to the overall problem. The observation that people feel the need to contribute their opinion on matters they feel they know about is why bikeshedding happens, but it's not bikeshedding itself.

calambrac · on April 9, 2009

You said "When I use the term...", a little phrase implying personal subjective interpretation, and that's was what I was responding to when I said "you can't just make up your definition", but okay, scratch that part.

The more important point was that you can't call someone wrong when they're using the actual, original definition correctly, which I was (the link I included was how the term entered the geek lexicon).

scott_s · on April 9, 2009

I said that because, at that point, I realized we had a difference of definition, and I didn't know which one was more common. Having looked at it, I think what I say is the accepted definition - regardless of the origin.

But. All of this is a digression. The entire reason I said I don't think this is bikeshedding is because I think there's value in discussing why join is a member of strings and not sequences in Python. Understanding this point helps in understanding Python's design.

nostrademons · on April 9, 2009

I find it really amusing that this has degenerated into a debate over the meaning of bikeshedding. Talk about bikeshedding!

calambrac · on April 9, 2009

A part of me wants to keep it going just for the irony value.

scott_s · on April 9, 2009

I recognized the irony, but well, bikeshedding. Can't resist.

bobbyi · on April 10, 2009

The reason join() isn't a method on lists is that it would mean that every class that wants to act like a list would have to implement it (or inherit from someone who does).

If you write your own class that is iterable but doesn't inherit from list, it would not have a join() method unless you wrote one. But it works as an argument to '\n'.join() for free since that can operate on anything iterable.

The other problem with sticking methods like join(), sum() and any() onto collections is that whenever a new method is added (for example, any() was added in python 2.5), it would cause confusion for anyone who has a list-like class that happens to have a method with that name that does something different.

_pius · on April 9, 2009

I agree with you if you mean that an array should be able to abstractly reduce its elements, but I'd argue that the specifics of how the array gets reduced are generally outside of the responsibility of the array class.

In Ruby, this is accomplished through Array#inject, which allows you to specify the procedure that gets applied as you're joining. This makes it easy to do summing, string concatenation, etc, etc.

avibryant · on April 9, 2009

Why is [1,"two"].sum any different from 1 + "two"? That is, 1.respond_to?(:+) is always going to be true, and yet sometimes sending the + message to a number will give a type error.

I don't buy into this idea of interface as binary - that if you respond_to? the message, it's always appropriate, and if you don't, it's not. Interfaces are something you look at when you're writing the program, and so they are interpreted by humans and can be necessarily fuzzy: "#sum will give you the total of the elements in the array, unless they aren't homogenous, in which case you'll likely get an error". Fine. If you don't know enough about the array in question, don't send #sum.

Similarly, if you don't know that this is a chequing account, it would be odd to ask it to write a cheque.

But at runtime, you send the message, and you maybe get an error. Whether this is TypeError or NoMethodError seems entirely irrelevant.

Incidentally, my objection to Array#sum is that you don't know what to use as the "0" element (what if the elements implement +, but aren't numbers), though I'd be fine with it being called #numeric_sum or the like.

jsf · on April 9, 2009

The difference between [1, "two"].sum and 1 + "two" is that in the second case you are passing "two" to 1's :+ method, that's an argument error, it doesn't depend on 1's state. In the first case you are not giving any argument to the method, so you shouldn't get an error. From the point of view of the array's user the array is now in an invalid state and that's not something the array should have allowed to happen.

wvenable · on April 9, 2009

I'm not a Ruby programmer so I have a simple question: Is it common to request whether an object implements a particular method before you make any sort of call? That seems considerably more fussy than strong typing of interfaces or even the type-hinting of interfaces that PHP has.

I can't help thinking a cleaner solution is the more traditional way: Create a subclass called NumberArray that can only hold numbers and has specific methods for operating on them. But I guess that might not be the "Ruby-way".

raganwald · on April 9, 2009

> Create a subclass called NumberArray that can only hold numbers and has specific methods for operating on them.

I can't see anything wrong with this for some cases. Another way to get there would be to define a NumericCollection module that can be used to extend any Array.

tjstankus · on April 9, 2009

I would tend toward that idea: a module that can be included. Injecting a sum method into the object's ghost class also maintains scope of responsibility, but feels a bit more magic. I'd opt for less magic. At first, anyway. :)

_csoo · on April 9, 2009

How about create a method called detectSum that accepts a block as an argument. The block will return the specific field to use when summing.

This lets you skip subclassing.

stonemetal · on April 9, 2009

It isn't typical. The author was using it as a way to show what a class claims to support.

davidmathers · on April 9, 2009

Not all arrays can be summed, but they all claim to respond to #sum. This is extremely broken.

This is how I felt when I first saw Array#sum. But then I took a few hi-

But then I tried it a few times. And now I can quit anytime I want.

I think String#titlecase is correct though. Not all arrays can be summed, but all strings can be titlecased, including part codes.

randallsquared · on April 9, 2009

But does String#titlecase do the right thing for ß? :)

jeffesp · on April 9, 2009

But the point isn't that String#titlecase can't do the right thing here. I assume there is an implementation of titlecase that will work with ß. But I can't think of an Array#sum that will work for [1, "two"].

Of course your :) might have meant you were being funny, then I just totally missed your point and explained something you already know.

randallsquared · on April 9, 2009

Well, there is a tenuous connection in that it's not obvious what to do for casing of ß. I wrote some unicode stuff for CL once and figuring this out was a nightmare. In any case, if you have two data types that that don't make any other sense as an addition, you could always just convert them both to bitfields and sum that... now I'm just being silly. Don't mind me.

davidmathers · on April 9, 2009

According to wikipedia "ß".upcase is "SS", except in the case of legal documents it remains "ß".

I checked irb and "ß".upcase is "ß", so I guess strings in ruby are of type "german legal document."

I also just learned that the reason there's no uppercase ß is that no words start with ß, so titlecase would never touch it.

_pius · on April 9, 2009

I agree with you.

That being said, I suspect that this turns out to be a cultural debate more than anything else. Remember, Ruby is a freedom language. Ruby could eliminate many of the issues you raise by being strictly typed, but of course that'd defeat the purpose.

The whole thing is a question of trust. Do you trust clients enough to give them the syntactic sugar of Array#sum even if their arrays may not all be strictly summable?

The answer to that question if you're writing Java is probably no. If you're a Rubyist, however, the answer is a resounding maybe. For something like ActiveSupport, it's probably OK. For other more hostile environments, maybe not.

As to whether it's idiomatic Ruby to provide Array#sum, I'd argue no, even though it's culturally acceptable.

scott_s · on April 9, 2009

Disclaimer: I'm not a Rubyist.

I think saying "Ruby is a freedom language" is a cop-out for foregoing good design. It excuses us from having to think through if the design makes sense, because we can always throw up our hands and say "Ruby is a freedom language." But I also think that's not an entirely true statement: Ruby doesn't let programmers jump to any random line of code. (I did find a neat module that implemented gotos and labels, but I think it only works at proc granularity, not line granularity.) It doesn't let programmers muck with the the stack frames or manage their own memory.

The reason for these non-freedoms is that past experience has shown us that in some domains, some language features are high risk, low yield. I see the freedom of Ruby as a chance to experiment with many different ideas for language design, and the practical value of that is the languages we design in the future will have those lessons baked in. But by defaulting to the idea that it's a "freedom language," we resist learning.

I say this as a non-Rubyist. I wanted to learn a dynamically typed language, and I chose Python. I don't have the time now to delve deeply into two languages. So I'm commenting on this as an observer, but I'm an observer who sees the value in Ruby, and potential implications to future languages.

_pius · on April 9, 2009

I think saying "Ruby is a freedom language" is a cop-out for foregoing good design.

I think some people could use that designation as a cop-out, but I don't think I'm doing that above. The point I was trying to make is that Ruby, by design, is not strictly typed and allows you to do things like metaprogramming easily. I'm certainly not trying to be an apologist for poor design.

raganwald · on April 9, 2009

> The whole thing is a question of trust. Do you trust clients enough to give them the syntactic sugar of Array#sum even if their arrays may not all be strictly summable?

I agree that Ruby should embrace Freedom, but that does not mean that all choices programers make freely are good ones. This is not about whether programmers accidentally call #sum on arrays that cannot be summed. It is about whether, when I look at the methods implemented by Array, it describes what arrays do or what arrays sometimes do or may do once in a while.

I actually think that Array#sum is the least objectionable in a single project where summing the elements may be common, maybe objectionable in ActiveSupport where every Rails project gets it whether they sum arrays or not, and absolutely wrong in Ruby's core libraries where every Ruby programmer will get it.

Free choices should be scrutinized most carefully when they has the broadest impact.

_pius · on April 9, 2009

It is about whether, when I look at the methods implemented by Array, it describes what arrays do or what arrays sometimes do or may do once in a while.

Sure. I think it's in Ruby's culture for people to write ultra-cute convenience methods and DSLs for this sort of thing, even if it makes it possible for someone to write code that doesn't make semantic sense.

I actually think that Array#sum is the least objectionable in a single project where summing the elements may be common, maybe objectionable in ActiveSupport where every Rails project gets it whether they sum arrays or not, and absolutely wrong in Ruby's core libraries where every Ruby programmer will get it.

Agree 100%. Arrays in Ruby core shouldn't be summing, they should be injecting.

koningrobot · on April 10, 2009

I see the point, but I disagree that it's a problem that needs to be solved. What is the alternative? A "Numbers" class? And how would this look in syntax? Numbers.new([ 0, 1, 2, 3 ])? I mean, I'm sure you could make it a special case, but it really isn't a special case. Type inference won't cut it, and having to specify everything explicitly is tedious (and a matter of where you draw the line).

And what would you get if you mapped over a Numbers object? An Array or another Numbers object? Looks like you'll have to specify the return type for the block. Even then, is an array of numbers really always a Numbers object?

I suppose that in this case, you could use a Numbers module and (somehow) make every array of numbers magically inherit this. But a generic, extensible way of doing this would be really, really complex and hard to do efficiently. Does every newly created collection have to be checked for Numbers-ness?

Really, you can go as wild as you want to, but I'd rather just be able to map, transpose and flatten around without all of this bureaucracy. Besides that, I'm fairly certain this is only a problem in single-dispatch languages where methods are owned by classes.

tjic · on April 9, 2009

Calling [1,2,3].inject(0) {|sum, ii| sum + ii } has a cleaner smell, but it's a lot more typing.

I wonder if the solution might be to have all arrays support sum (in the responds_to() sense) only if they can can actually perform the operation.

Thus

[1, 2, 3].respond_to?(:sum) => true

[1, [2, 3]].respond_to?(:sum) => false

I can come up with an O(1) way to do this where we assume that an array does respond to sum, and degrades to not supporting sum when a non numeric is inserted ... but getting the array to support sum again if/when the non-numeric is removed... I'd have to think for a bit to come up with something better than O(n)...

raganwald · on April 9, 2009

    [1, [2, 3]].respond_to?(:sum) => false

I elided a discussion about this case from the post. For some arrays, a recursive sum is semantically valid. For other arrays a recursive sum is not semantically valid, and it is not a simple case of knowing whether all of the leaf elements respond to :+. So again, trying to implement this in the Array class doesn't work because containers are implementation rather than interface.

If you are going to use an Array class, I think the array's client is the one responsible for knowing whether its elements can be summed and if so, whether recursive summing is valid. The other approach is to subclass or otherwise create special-case arrays that know about their semantically valid operations like sum.

ken · on April 9, 2009

Reminds me of Kent Pitman's article about, among other things, why there's no generic deep-copy in Lisp: http://www.nhplace.com/kent/PS/EQUAL.html

His answer is similar: just because you know the structure of something doesn't mean you know anything about its semantics.

Because of this, I'm tending to believe that adding more strong-typing to languages is a losing battle, though I admit this raises more questions than answers.

cbeust · on April 9, 2009

> Because of this, I'm tending to believe that > adding more strong-typing to languages is a > losing battle,

I see this at the opposite.

Just because an object says that it responds to a method doesn't guarantee that invoking said method will succeed. I'm not sure why the original poster is so surprised by this finding.

Since this observation is valid in both statically and dynamically typed languages, I prefer a statically typed one since at least, I don't need to verify that the object does respond to that method before calling it.

raganwald · on April 10, 2009

A question and answer that might be relevant: http://www.reddit.com/r/ruby/comments/8ba5g/my_objection_to_...

evanmoran · on April 9, 2009

The method should be: array<number>.sum

The article is correct .sum shouldn't be on an array, but sum shouldn't be global either. This is why generics exist.

_csoo · on April 9, 2009

Or you can have a sum method that accepts a block/lambda/anonymous function as an argument.

This is done in Smalltalk with detectSum.

    people detectSum: [ :person | person age ].
    items detectSum: [ :item | item price ].

jdminhbg · on April 9, 2009

In Ruby, this would be:

    class Array
      def detectSum(&block)
        self.map(&block).sum
      end
    end
      
    >> [1, 2, 3].detectSum {|i| i**2}
    => 14

Most Ruby people though would just call #map and #sum in succession on an ad hoc basis.

_csoo · on April 9, 2009

Ew, how inefficient. Here is the Smalltalk code:

    detectSum: aBlock
        "Evaluate aBlock with each of the receiver's elements as the argument. 
        Return the sum of the answers."
        | sum |
        sum := 0.
        self do: [:each | 
            sum := (aBlock value: each) + sum].  
        ^ sum

Anyway, my point still stands. Better to accept a block than to use a sum method that assumes the Array consists of numbers

evanmoran · on April 15, 2009

I agree that using blocks/closures is a powerful and useful mechanism -- In fact I think that they are a great implementation technique for this method.

What this comes down to for me is: what makes the best interface?

Specifically:

  myArray.sum()

is much easier to read and use. In a world where we have to maintain our code much longer then we write it this simplicity is very valuable.

Secondly, it is important to note that since the array uses generics it is not necessary to check that the items are numbers, as any attempt to insert a non-number would have thrown an error.

So in summery I agree with you completely that array<variant>.sum should _not_ exist. But I hope you see my point that array<number>.sum _should_ exist. (If the code enforces the numbers to exist in the array there is no reason not to include the sum method!)