Saturday, December 16, 2017

Ordered Keys in Dictionaries

Raymond Hettinger (@raymondh)
#python news: 😀 @gvanrossum just pronounced that dicts are now guaranteed to retain insertion order. This is the end of a long journey.

Tuesday, December 12, 2017

The Business of Book Promotions

It is hard (for me) to promote my books. It seems like empty vanity. I realize that it's not -- promotion is essential -- but it's difficult.

Packt just send a raft of detailed information for authors. Some things they suggest I do.

  • ✅Referrals. Want a "free" Python book? The "free" is in scare quotes because you'll have to write a review. Otherwise, the book is yours. DM me @s_lott and I'll put you on the list.
  • ✅Amazon Author Profile: Check.
  • Packt Blog Posts. Interesting. I'll have to look into that. I think they can follow the RSS feed from here. If they can, check, this is done.
  • ➡️Other on-line PR. Cool. I give them the blogs I follow. Nice. I can do that.
  • ✅Social Media. I (sort-of) link these blog posts to my Twitter feed through But I'm half-hearted about it. Packt suggests in including @PacktAuthors so that there's a proper Twitter tie-in. That seems like there's no work to that. Also, there's a Packt Experts Author Community on Facebook.
  • ✅Blog. Got it. You're reading this. And, also, here:
  • ➡️Author Newsletter. This seems like next-level blogging with more serious editorial planning. Interesting.
  • 🆒Packt Promotions. I like this: they do the marketing. I just have to cooperate by sharing the information.
  • 🆒Virtual Book Release Party. Wait. This could be cool. Functional Python Programming 2e is coming next summer. Hmmm. This could be fun. Banners and raffles for freebies or discounts.
  • 🚻Author Street Team. Mutual Support. Advanced Copies. I would need to keep it organized, right? That's (potentially) a chunk of work. I'll need to contemplate that
  • ✅Conferences. This is fun. For one of my PyCon trips, I got a promo code from Packt for free content during the conference. That was handy to give out. I went to Vistaprint and had a box of cards printed with contact info and the promotion code. 
  • ➡️Packt Live on-line conferences. I've done a few webinars. They're difficult. Writing is easier because it unfolds more slowly. I'll have to look into doing a few more of these.
  • ✅LinkedIn Profile. 

I've hit a few. A few more to do.

This is a pretty comprehensive list. It's good to see this kind of author support.

Also. They've done some serious re-engineering on their Author toolchain. MS-Word documents seem to be a thing of the past.

Tuesday, December 5, 2017

Functional Requirements and Use Cases -- even for "simple" things

In the mailbag I found this nonsense, doomed to inevitable failure:
"As I get more serious about this data science stuff, it has become obvious that a windows machine is not the way to go. ...
Q1: What other things should I think about and consider while shopping for a new computer?
Q2: Are there issues w/ running VmWare and Windows 7 w/in VmWare on Ubuntu?
I've omitted many, many words (400 or so.)

Here are all of the functional requirements I could discern:
  • I would like to have 1 machine. I don't want a desktop and a laptop
  • Install VmWare 
  • Install Windows 7 using VmWare
This was all of the functional requirements. The other 400 words involved specifications. Nothing that approaches a use case other than singular, VMWare, and Windows under VMWare. The form factor of laptop, which seems to go without saying, might be a user story, but that's pushing it.

The "a windows machine is not the way to go" and "Install Windows 7" indicate a fairly serious problem. It is not the way to go and it's required. Both. This is doomed to inevitable failure. 

This is not the way to make a decision.

Q1. What other things should I think about? 

Just about every other thing. Start with use cases and functional requirements. Skip over specifications. (In general, never start with specifications because that's where you end: a list of useless numbers that don't bracket what you actually want to do.)

Use Cases Matter. Specifications Don't Matter.

Write down all the Mbs and Tbs you want. Without a use case, they're irrelevant noisy details. Throw the numbers away until you have a list of verbs. Things you will DO. 

With so few actual functional requirements, almost *any* computer (possibly including a Raspberry Pi 3) would pass the suite of acceptance test cases.

✅ One Machine.
✅ VMWare.
✅ Windows.

After a lot more back-and-forth, I discerned one (or maybe two) additional functional requirement(s).
  • I have leo w/ java to gen html.
I know what Leo is in this context. I'm guessing the "java to gen html" is JRst. The lack of clarity is, of course, part of the problem here.

This requirement surfaced in the context of explaining to me why Windows was so important. Really. Windows was required to run two open-source apps. And. "a windows machine is not the way to go." Doomed. To. Inevitable. Failure.

Here's the only relevant functional requirement(s): run Leo and Java. And even then, there's a huge hole in this. Leo is Python-based. Docutils RST2HTML is Python-based. Why not simply use Leo and Python?  What does Java have to do with anything?

Buy this: a Pi-top:

Q2. Are there issues w/ running...? 

Yes. Always. For everything you can possibly enumerate there are "issues". 

There. Are. Always. Issues.

Use Cases Matter.

Since you don't have any functional requirements or use cases, it's impossible to filter the issues and see if any of the known issues impact what you think you're going to do.

From what I was told, a Pi-top covers everything that's required. It's hard to be sure, of course, when the functional requirements are so vague. But there's no evidence that the Pi-top can't work to fill all of the stated functional requirements.

What To Do Next

It seems obvious, but the next step is to create a test plan. Actually, that was the first step. Since it wasn't done first, now it's the next step.

Write down the things you want to do. Make a list. Ideally a long list of things you will DO. Active voice. Verbs. Actions. Tasks. Activities. It's hard to emphasize this enough.

Then, when considering a computer, see if it can actually do those things. Test it against the requirements to see if it does what it's supposed to do. Among all the machines that pass the tests, you can then sort by price. (Or availability, or reputation, or cool stickers, whatever non-functional requirements seem relevant.)

The questions of Tb and Mb and processor clock speed mean nothing. Nothing. Find the cheapest (smallest) machine that does what you want. Don't find the machine with xMb and yTb of whatever.

There there's this, "As I get more serious about this data science stuff" which seems little more than context. But it's really important. Indeed, it's essential.

If you're going to do machine learning, you don't really want to buy the necessary computer. You want to rent it for the hour or so each day you actually need it. It will be idle 23/24 hours each day (96% of the time.) Why buy that much horsepower which you are never going to use.

If you're going to login to a server you purchased from a cloud computing vendor. Amazon AWS. Microsoft Azure, etc., then, you can probably get by with a tablet that runs SSH and a browser. A tablet with a cool keyboard and a little display rack can be very nice. and seem to be all that's required.

Without Use Cases, however, it's impossible to select a computer. Don't spend money without test cases.

Tuesday, November 14, 2017

CI/CD DevOps and Python

See for the 16 gates that separate a good idea from secure, productive use of software. While a lot of DevOps folks like the idea, when it comes to implementing it for Python apps, they get confused.

The confusion seems to stem from Python's lack of a proper "build" step in the CI/CD pipeline. I've had the "everything involves a build" argument and the "well is analogous to a build" arguments. I have to acquiesce to these positions as part of making progress. In this case, reasoning by analogy can be misleading.

I want to focus on the two gates that are part of the code itself, separate from the rest of the pipeline.

  • Static Analysis 
  • >80% Code Coverage (which implies Unit Tests)

Unit Testing

My preference is to run the unit test suite first and get that out of the way. If the unit test suite fails, or fails to cover 80% of the code, any other considerations are moot.

I like Git triggers based on Pull Requests (PR's) and Merge to Master for checking these two conditions. I like the idea that a PR can't be discussed until unit tests pass. They can also be part of whatever other pipeline is going on, but I like them to be done early and often.

(I worked on a sprint team where the PR unit test wasn't trusted by one of the devs: he'd carefully check out the branch and rerun the unit tests. His comments were good, so the extra effort paid off. I guess.)

After flirting with a lot of frameworks, I'm happiest with py.test. I like the py.test-coverage plug-in and the py.test-BDD plug-in.

Yes. We have acceptance tests for our features written in Gherkin. And we have pytest fixtures that are used by pytest-bdd to process the scenarios in the Gherkin feature files. It actually works out nicely because we have a cucumber.json file that makes everyone happy that we've run an acceptance test suite along with our unit test suite.

What's important is the coverage report is painless and automatic.

And it's compatible with the Ruby-based cucumber tool without involving any actual ruby.

For integration testing, we use Behave. This is a bit more cumbersome than pytest-bdd, but it's appropriate for the bigger-picture testing where we have a docker cluster and have to see a number of "Then" steps to confirm operations spread across a suite of microservices.

The goofy question that often leads to endless confusion is the relationship between unit testing and "build." The setup definition includes a `tests_require` parameter. This *should be* all that's needed to do `python test`, which *should be* all that's involved in testing.

Is it a "build"? No. But. You can tell the DevOps folks it's a build if it makes them happy.

Static Analysis

There are several kinds of static analysis. Folks who work in Java are used to having Sonar analysis performed. This is above and beyond the static analysis already performed by the compiler. It seems excessive to me, but folks deploying Java seem to like it.

For Python, there are two important static analysis tools. And this is another source of profound confusion for DevOps folks new to Python.

I like to extract the last line of the pylint output and use that numeric score as the "bottom line" on static analysis part 1. While the default setting is 9.5, that can be a challenge, and we prefer 9.0 as well as some local pylintrc modifications to modify some checkers (e.g., set line length to 120.)

For mypy, it's a little bit more complex. We're still fumbling around here.

Ideally, the type hints are all clean and mypy has nothing to say. We can, of course, fix any errors by claiming everything uses Any and returns Any and every assignment statement sets an Any value. But that's so wrong.

There are (still) modules which require typeshed stub definitions. Ideally, we'd provide these. This would be better than using Any as a hack-around. While good, it's a lot of work.

For now, I think it's sensible to have two "pass" rules for mypy: clean or typeshed error. If mypy is silent, that's perfect. If mypy can't find stubs in typeshed, we can let this go for now and log an issue from the CI/CD pipeline to note the presence of technical debt.

In the best of all worlds, we'd fork the package, fix the type hints, and put in a PR. That's a lot better than using typeshed to work around the lack of hints. 

And, of course, there's the "build" question. For mypy to work, the dependencies (or their typeshed stubs) must be present. We wind up doing a `python install` to build out the requirements. Is this a "build"? Maybe. You can tell the DevOps folks it's a build if it makes them happy.

If you want idempotent server (or container) builds, you'll need to be sure that you pin specific versions. It can help to break this into two parts:
  • a requirements.txt with specific versions 
  • a generic version-free high-level list in
The reason for this separation is to make it easy to do a `pip install` or `conda create` from the detailed requirements. Once that's out of the way, the `python` will run very quickly. If you're working with Docker containers, the `pip install` (or `conda create`) can be part of the Dockerfile, and then tests or static analysis can be run separately, after the initial wave of installations.

Tuesday, November 7, 2017

Python Type Hinting -- generally easy until you find your design flaws

Adding type hints is easy and fun. Seriously. It's not a lot of work.


Until you find a piece of code that does more than what you sort-of thought it kind-of did.

def null_aware_func(x):
    if x is None:
         return x
    return 2.2*x**1.05

This is a stab at a none-aware computation.

Let's add type hints, shall we?

def null_aware_func(x: float) -> float:
    if x is None:
        return None
    return 2.2*x**1.05

This won't fool mypy. Sigh. It passes unit tests, but it's flagged as a problem.

We have a variety of ways of define this function. And that means we need to think carefully about our None-aware design.

Is this really an @overload?

from typing import overload
def null_aware_func(x: None) -> None:
def null_aware_func(x: float) -> float:
    if x is None:
        return None
    return 2.2*x**1.05

And yes, the ... is legit Python syntax. (It's a rarely used token that forms the body of the function.)

Or is this a more advanced type?

from typing import Optional
OptFloat = Optional[float]

def null_aware_func(x: OptFloat) -> OptFloat:
    if x is None:
        return None
    return 2.2*x**1.05

I'd argue that OptFloat is a more sensible definition. However, if this is the only function that's none-aware, perhaps it's an overload.

The deeper question is one of underlying meaning. Why are we doing this? What does it mean?

And. Bonus. Will this be working in a SQLAlchemy environment, where they have their own wrappers for database objects, meaning that `is None` doesn't work and `== None` is required?

What's important is that adding type hints forced us to think about what we were doing. Unlike Java we did this without stopping progress for an extended period of "wrestling with the compiler". We can use Any temporarily because the unit tests all pass. Then, we can pay down the technical debt by fixing the type declaration.

Total. Victory.

Tuesday, October 31, 2017

Some Reading

Higher-Order Functions. A really cool idea. Javascript isn't my favorite language.

This, on the other hand, is huge: trunk-based development.

I'm really tired of having a dev branch with periodic commits to master so we can deploy from master. It's so much nicer to tag a release and deploy that.

Here's the top 10 Python list for September 2017.

Learning Python: Zero to Hero:

Why we switched from Python to Go: We distribute a CLI for our API in Go because it's slightly simpler to provide executables in Go. However, our user community has Python already... We need to provide similar functionality in Python.

Technical Debt:

Software Engineering v. Programming.

Annotations in Java? Ugh. The level of complexity seems to have gotten out of control.

Programming vs. the text of the code itself. I'm don't by the reductionist 1-dimensional view of text. Yes -- in a narrow, technical sense, it's one-dimensional. If you want to be really reductionist, it's a stream of bytes, which are really just a base-256 number. I don't think the reductionist argument is helpful. However, I am sure that the declarative/imperative distinction is worthy of a lot of thought, and this is a nice comparison. (In spite of the javascript.)