You are viewing bramcohen

Thu, Apr. 14th, 2011, 05:34 pm
Python wish list

Now that the moratorium on Python language features is over, I'll put in my thoughts on what new stuff the language could use. I don't have much to suggest, and what I do have to suggest is fairly minor. This is because I'm happy with the language.

new on default parameters

One of the gotchas in python is that default parameters are reused, so if you say:
def spam(eggs = []):

then eggs will get set to the same list every time, and modifications will get carried over between calls. This can be hacked like this:
def spam(eggs = None):
    if eggs is None:
        eggs = []

This works, but is ugly, and prevents passing in None as a value for eggs. It would be better to be able to simply say:
def spam(eggs = new []):

which should do exactly what you expect.

^ on bytes

A strange oversight in Python3 is that bitwise operators don't work on byte arrays. The ^, & and | operators should work on bytes of equal length, doing exactly what they obviously should. Trying to apply them to bytes of unequal length should probably result in an error. It's easy enough to write functions to do these things, but they're slow, and there's only one reasonable semantics for what those operators should do on byte arrays anyway.

raw binary conversion

Maybe this has been added to the standard library and I just haven't heard about it, but a longstanding annoying missing piece of functionality is simple conversion between ints and big or little endian representations of them as bytes. Again, this is easy enough to implement, but is slow when done in Python and is hardly an obscure piece of functionality.

dictionary scrambling

This might be an obscure piece of functionality, but I'd like the ability to change the hashing function which dictionaries use, because I write tests which depend on my code behaving the same way every time it's run, and I'd like to be able to test that it doesn't have any dependencies on the order of iterating over dictionary keys or values.

Fri, Apr. 15th, 2011 03:15 am (UTC)
elsmi

raw binary conversion -- do you mean struct, or something else?

Fri, Apr. 15th, 2011 04:31 pm (UTC)
bramcohen

struct does a reasonable approximation of what I want, but Yech.

Fri, Apr. 15th, 2011 03:28 am (UTC)
Thomas Barr

It sounds like the struct module should do what you want for raw binary conversion.

Since the dictionary objects are only kept in the hash table and iteration just walks through that table looking for objects, there'd be no way to scramble the contents in-place. Objects can only be in one place for a given hash value. You'd have to implement this by iterating the dictionary into a list and shuffling that list in place (like random.shuffle does), so you might as well do it yourself.

^ on bytes is an obvious feature, though, you're absolutely right. I might actually write a patch for that myself...

Fri, Apr. 15th, 2011 05:11 am (UTC)
bramcohen

Just changing the hash function which dicts use should scramble things well enough.

Does struct work on arbitrary size integers and allow completely cross-platform operation? The documentation is less than ideal.

Fri, Apr. 15th, 2011 07:21 am (UTC)
manuzhai

The table for the format characters seems clear enough?

Fri, Apr. 15th, 2011 04:32 pm (UTC)
bramcohen

Until you've gone through some examples, it's very opaque, and it really, really, would be nice if it clarified whether int sizes might be different on different platforms.

Fri, Apr. 15th, 2011 06:46 pm (UTC)
elsmi

The docs really could be better (I use struct constantly, and it took me ages to realize that you can give a repetition count like "<4I"), but at least the current docs do say explicitly that if you use the 'native mode' indicator ("@", which is default), then you get whatever sizes and alignment the C compiler would use for a C struct with the given fields, and if you use one of the standardized modes ("<", ">", "=", "!") then you get the usual int sizes (and no padding) regardless of platform. So "@" is how you talk to poorly-written C code, and for network protocols or well-written binary formats you use "<" or ">".

Fri, Apr. 15th, 2011 07:20 am (UTC)
manuzhai: dictionary scrambling

Actually it's not the dict that defines the hashing method, it's the __hash__() method of the class of the objects used as keys. So you could probably handle this in your own code by setting the __class__ of the keys to a subclass of their actual __class__, plus a changed __hash__ method?

Fri, Apr. 15th, 2011 04:03 pm (UTC)
bramcohen: Re: dictionary scrambling

Yeah, I could do something like that, but that would involve extensive ugly changes to my code, when it should just be a single call to the stdlib.

Fri, Apr. 15th, 2011 07:35 am (UTC)
ciphergoth

I don't understand the semantics of your proposed "eggs = new []". If it's "eggs = new list" I understand them.

Fri, Apr. 15th, 2011 04:02 pm (UTC)
bramcohen

Saying [] is how you instantiate a new list in Python. It's very non-idiomatic to say list() (although I tried it now and that does work).

Fri, Apr. 15th, 2011 04:07 pm (UTC)
ciphergoth

Right, but [] is a list, where as "list" is a function that returns a new list every time you call it. Though I guess "new" could act like "lambda" wrt whatever was on its right.

Fri, Apr. 15th, 2011 04:35 pm (UTC)
bramcohen

Yeah, it's sort of acting like a lambda, but you could put an actual lambda there and it would result in the parameter being a function, which is not what you want. Other values should work as well, for example

def spam(eggs = new {3: ['a', 'b', 'c']})

Fri, Apr. 15th, 2011 07:38 am (UTC)
ciphergoth

WRT dictionary scrambling: maybe what you need is a way to arrange for your own class to take the place of "dict"?

Fri, Apr. 15th, 2011 04:06 pm (UTC)
bramcohen

Yes, but that would involve extensive changes to my whole codebase for what should be a single call.

Fri, Apr. 15th, 2011 04:08 pm (UTC)
ciphergoth

No, I mean a way for Python to use your class whenever you refer to "dict" or {} or similar.

Fri, Apr. 15th, 2011 04:34 pm (UTC)
bramcohen

I don't think it has that functionality, not so much monkey patching in Python.

Fri, Apr. 15th, 2011 06:50 pm (UTC)
figg

technically, you can monkey patch python with judicious use of ctypes, but I don't think that really counts as a solution.

Fri, Apr. 15th, 2011 06:53 pm (UTC)
poliphilus

Raw binary conversion: Exactly that feature already exists as of Python 3.2, see int.from_bytes and int.to_bytes.

Sat, Apr. 16th, 2011 05:55 pm (UTC)
bramcohen

Hooray!

Fri, Apr. 15th, 2011 07:07 pm (UTC)
figg

I'd love a let/where form (or similar) in python

i.e
x = foo(a,b,callback) where:
    def callback(...):
        ....



Most of my python gripes are so deeply rooted I know they'll never change (strings are iterable and return strings, so "a"[0][0][0][0][0] == "a", scoping & threading)

With default values, you can get reasonably far with a decorator - as a quick five minute effort

import functools
import copy

def newdefault(**names):
    def decorator(func):
        @functools.wraps(func)
        def wrapped(*args, **kwargs_):
            kwargs = {}
            kwargs.update(kwargs)
            for name in names:
                if name not in kwargs:
                    kwargs[name] = copy.deepcopy(names[name])
            return func(*args, **kwargs)
        return wrapped
    return decorator



@newdefault(default=[])
def terrible(value, default=None):
    default.append(value)
    return default


print terrible(1)
print terrible(2)


Although you'd need to handle *plenty* more edge cases for this to work transparently.

Sat, Apr. 16th, 2011 01:53 am (UTC)
agthorr

^ on bytes

Bit operations on bytes has been brought up before. Interestingly, Guido's take was that he'd rather add a good way to do raw binary conversion.

raw binary conversion

Added in Python 3.2 as int.from_bytes() and int.to_bytes().

Sat, Apr. 16th, 2011 05:56 pm (UTC)
bramcohen

Guido has a point, but I still think it makes sense to add the bitwise operators I suggested, while leaving the shifts out.

Tue, Apr. 19th, 2011 12:46 am (UTC)
hollowaynz: Trailing colons - an unnecessary hassle?

This is a syntax change rather than a feature change and if it results in some ambiguity then I'd appreciate it if someone could explain why, but with if/for/while(etc) blocks I don't understand why they don't ditch the trailing ":" character and instead use whitespace, e.g.
if True:
    pass
becomes simply,
if True
    pass
The : doesn't seem to add anything for the compiler, and in the rare case that you want to do it on the same line then you could use ";" which is more consistant anyway (you can already put multiple lines of Python on a single line with ";"). E.g.
if True; pass
So is there anything wrong about this approach? I'm not so much interested in whether it's impractical to change legacy code (I agree but that could be said of many syntactic changes) and I was more wondering whether there are reasons why Python shouldn't have had that syntax since day one.