(260) 638-9543

Notes on meiosis and microscopy


Mostly I’m making this page as a resource for if I ever forget:

Human histone names:

recommended name other names uniprot
h1.0 H1ʹ, H1(0), H1FV (815) 727-8092
h1.1 H1a, H1F1 HIST1H1A
h1.2 H1c, H1d, H1s-1, H1F2 HIST1H1C
h1.3 H1c, H1s-2, H1F3 HIST1H1D
h1.4 H1b, H1s-4, H1F4 HIST1H1E
h1.5 H1a, H1b, H1s-3, H1F5 702-798-6044

Perl one-liner: transpose variable-length tab-separated arrays

I wanted to try out the violin plots mentioned by Flavio on “Python in Science”, but my data was recalcitrant. The original source was a bunch of column data in Graphpad Prism. Each column is a different timepoint, and has a different number of points. My idea was to import the data as rows, so each goes into its own array. That requires transposing the numbers - no big deal, right?

I tried to use rs, a BSD command of at least 9 years’ standing, to do it…but could not figure it out, despite many options such as “-T” for “pure transpose of the data”. That made me go back to the old standby of perl, where using an anonymous array of arrays did the trick:

perl -ne 'chomp;@a=split("\t",$_);$b=0;foreach $i(@a){push (@{$ar[$b]},$i);$b++;}if(eof){for $i(0..$#ar){print join(" ",@{$ar[$i]});print("\n")}}' DATAFILE

Yeah, that’s how I do stuff…as much as possible on one line. Note that it has to be tab-separated, and the last rows (those that have only a few columns) still need to have null or whitespace entries separated by tabs.

When a command finally works, I try to remember to type the line into the terminal again with a # prepended, and a comment like “finally worked!” at the end, so I can search my terminal history for “worked” when I want to see what finally made it.

Anyway, the data transposed fine and now I’m trying the violin plots…

To read the data in python:

import numpy as n

with open("DATAFILE") as f:

d=map(n.array,[map(int,i.split()) for i in content])

Now d is a list of arrays of variable length. I tried the violin plot right away and got a strange result so it needs more work. But it would be nice to have this to emphasize the large number of zeros in the data, as opposed to simply having the whisker of a box plot extend to zero.

Update: it works, I had a wrong variable earlier. Many thanks to Flavio for his tutorial.

One-liner for CLUSTALW-to-multi-FASTA

It’s very convenient to have pre-run ClustalW alignments in Wormbase, but sometimes proteins are included that do not have much homology, which makes the symbols at the bottom less useful. You can see that all the nematode proteins are highly conserved, for instance, but that one yeast protein is screwing it all up. So you’ve got to run the alignment on your own…but there’s no convenient link to the non-aligned protein sequences; all you have is the Clustal output with lots of gaps, everything split over multiple lines, etc.

That’s where perl can come in handy. Just copy the entire Clustal output and run this in the terminal:

pbpaste | perl -ne 's/\-/g;@l=split;push @{$h{$l[0]}},$l[1] if ($l[0] =~ /[A-Z]/);if(eof){for $key(keys %h) {print ">",$key,"\n",join("",@{$h{$key}}),"\n";}}'

This will result in a single FASTA file (written to STDOUT) containing all the original protein sequences, ready to be chopped up and re-aligned.

Raytracing 002: Chromosome pairing

This movie portrays six chromosomes in a nucleus that start out randomly positioned and end up paired with their homologs:

The drastic change that occurs midway through the movie (all the chromosomes seem to be attracted to each other) conceptually mimics the “transition zone” stage in C. elegans meiotic prophase, which superficially resembles all the chromosomes shoved up against one side of the nucleus in a crescent shape. However, we’ve since learned that chromosome ends undergo rapid movement1 during this stage, which the simulation does not include. Rather, this simulation proceeds by temporarily increasing the tendency of chromosomes to aggregate.

  1. Wynne, David J ; Rog, Ofer ; Carlton, Peter M ; Dernburg, Abby F: Dynein-dependent processive chromosome motions promote homologous pairing in C. elegans meiosis. In: The Journal of cell biology Bd. 196 (2012), Nr. 1, S. 47–64. — PMID: 22232701

Journal club: Asymmetric chromatid segregation in germline stem cells

We recently discussed a very cool paper from the Yamashita lab (U. Mich) for our journal club: Chromosome-specific nonrandom sister chromatid segregation during stem-cell division.518-641-5948 Although the authors had earlier shown that bulk sister chromatid segregation on the basis of the template’s age in male Drosophila germline stem cells (GSCs) is random,8177735474 the current work shows that the X and Y chromosomes do indeed segregate nonrandomly, with one of the two sister chromatids (distinguishable by CO-FISH) tending to stay in the germline stem cell. There are several remarkable features about this work:

  • The asymmetry of inheritance is not based on the age of the template strand, but its actual identity, i.e. which of the two parental strands (Watson or Crick) served as its template. In this example, if you imagine replicating the X chromosome by unwinding its double helix starting from its left terminus, pulling the 5ʹ end to the right and the 3ʹ end to the left, the GSC would prefer to inherit the right chromatid 85% of the time.

The chromosomes on the right of each pair are preferentially inherited, despite the fact that in some cases its non-preferred sister will have the older (magenta) template.

  • Although there is no biased segregation for any individual autosome (chromosomes 2 or 3) based on its parental template, there is still something unusual going on with autosomes: in a provocative sentence that will hopefully be unpacked in a future paper, the authors say:

In spite of the lack of biased segregation with regard to which strands are inherited by GSCs, cells always inherited two Cy3 signals or two Cy5 signals, the mechanism and significance of which remain unclear.

What this implies is that even though GSCs have a 50% chance of inheriting the Watson-templated strand or the Crick-templated strand of a given autosome, they will always inherit the same strand from both homologs!

How could that happen? Since homologs are usually paired in Drosophila, it certainly is possible that there is a functional spatial relationship between the centromeres of homologs, such that both Watson-templated centromeres point in the same direction (likewise for the Crick-templated centromeres).

  • While the selection of X and Y strand by template origin rather than by age means that GSC chromatids are not immortal, it does mean that the X and Y chromosomes will have undergone fewer full replications than the autosomes. We can find the expected extent to which biased strand segregation reduces the number of replications by a simulation in GNU Octave:
Simulation of strand segregation
niter=1000;                           #number of iterations to collect
ngens=100;                            #number of cell divisions for each iteration
gsc=zeros(ngens,2);                   #array to hold the list of chromatid ages
e=0;                                  #index of cell array
for b=0.0:0.1:1;                      #iterate bias levels: 0.5=random; 1.0=fully towards GSC
    e=e+1;                            #increment cell index
    for li=1:niter;                   #iterations to average
        for l=2:ngens;                #iterate over cell divisions
            r=gsc(l-1,:);             #set r to the current age of the chromatids
            rT=[r(1) r(1)+1];         #the result if you keep the left chromatid
            rA=[r(2)+1 r(2)];         #the result if you keep the right chromatid
            if(rand<b),               #choose one depending on the bias
                gsc(l,:)=rT;          #this is the more likely choice if bias exists
                gsc(l,:)=rA;          #this is the less-likely choice
            end;                      #done with all cell divisions for current iteration
        gscc{e}(li)=mean(gsc(end,:)); #set next entry of the cell array to age of oldest chromatids
        end;                          #done with current iteration
    disp(e);fflush(1);                #display progress of the iterations
end                                   #done with all bias values

The results:

bias value — chromatid age
0.0   0.5
0.1   18.259
0.2   31.989
0.3   41.68
0.4   47.549
0.5   49.338
0.6   47.364
0.7   41.474
0.8   31.731
0.9   17.968
1.0   0.5

If we plot bias against chromatid age and do some polynomial fitting via polyfit(bias,age,2), we get a nice parabolic fit (green line going through the data points).

For a given bias level b, and number of generations g, the expected chromatid age is roughly3:

At the bias level reported in the manuscript (15% / 85%), the mean age of chromatids appears to be about half of what it would be if there were no asymmetric inheritance.

While the authors emphasize that the biological relevance of this segregation likely has more to do with keeping correct epigenetic information associated with the X and Y chromosomes, rather than reducing replications per se, it’s worth noting that the number of replications does in fact decrease.

Some other questions I have about this work:

  • Genetic data implicates a requirement for SUN/KASH proteins to connect the preferentially-inherited chromatids to the mother centrosome. How is this connection maintained throughout and after nuclear envelope breakdown?

  • Are the SUN/KASH proteins required for the biased inheritance of homologous autosomal chromatids?

  • Does the preferentially-inherited chromatid have a bias for centromeric or other sequences to be on the leading vs. lagging strand?

The second-weirdest C. elegans chromosome: IV

Whenever I spend time checking out sequence distributions in the C. elegans genome, I am always struck by the strangeness of chromosome IV. Yes, the X chromosome is the weirdest and its exceptional behavior is well documented.1 But I think IV comes in a distant but clear second.

For instance, let’s say you’ve done the following analysis:

  1. Generate all 65,536 DNA octomers and use fuzznuc 812-832-4348 to find their positions in the genome.
  2. Divide each chromosome into thirds, and for each octomer, count how many times it occurs within each ⅓.
  3. Calculate the “pairing center enrichment” of each octomer by taking the ratio of the count on the third that contains the pairing center (PC) to the count in the third that does not contain the pairing center.
  4. For each chromosome, sort all the octomers from least to most PC-enriched.
  5. Average the 65,536 sorted octomers into bins of 512 to simplify plotting.

Then you would get graphs like those on the diagonal of the following chart:

The graphs on the diagonal show that, as expected, every chromosome has some octomers enriched on the PC end, some octomers that are rare on the PC end, and many that are not particularly enriched or rare.

A question naturally arises here: are the enriched and rare octomers the same for each chromosome, or are different octomers enriched differentally? To test this, you can take the individual octomer enrichment scores from each chromosome, arrange them according to the sorting order of every other chromosome, and average bins of 512 again. That is what has been done to generate the off-diagonal graphs.(518) 737-3419 Looking at these graphs, some qualitative trends can be seen:

  • The enrichment trends for chromosomes I, II, III, and V seem to be in agreement.
  • The trend of the X chromosome does not resemble I, II, III, or V, but more closely resembles IV.
  • Chromosome IV appears enriched not only for the octomers enriched on other chromosomes’ PCs (on the right side of its graphs) but also for octomers depleted on other chromosomes’ PCs (on the left side) as well as for octomers neither enriched nor depleted (the small peaks in the middle).

I will need some help to put these qualitative observations on a sound statistical footing (if that’s possible), but in the meantime it would appear that IV just has some different sequence properties than the other autosomes. I can only guess why that might be. Another interesting thing about chromosome IV is that it’s home to large islands of piRNAs (21U-RNAs); there are few if any of those RNAs outside IV.

Is IV just weird? Did it recently undergo a translocation, placing non-PC sequences right near PC ones? I don’t know, but I have learned to expect strange things from chromosome IV.

  1. See e.g. 6788075858

  2. From the EMBOSS suite, at /emboss.sourceforge.net

  3. For example, consider the graph in the “IV” column, in the 5th row. The enrichment scores from the graph above it (the IV/IV graph) are sorted according to the order of the graph to its right (the V/V graph), then averaged in bins of 512. The scale of the graphs on the diagonal goes from 0 to 2, while the scale of the off-diagonal graphs are from 0.85 to 1.45.

The OMERO struggle, part 001

I had to delete lots of images from our (902) 206-7792 database, since there were lots of dupes and the disk filled up. I first tried to delete the images manually, but that is actually quite complicated when you’re talking about a possibly massively linked data structure with lots of annotations, metadata, &c. Despite that, I expected that selecting groups datasets and clicking “delete” should result in deleted datasets after a while, but after long delays OMERO gave me only errors (with pages and pages of lovely Java messages, which is one of the main reasons I detest Java) every time.

I then realized I had to wipe and repopulate our entire OMERO database.

This link on the forum led me to the right answer.

First I needed to generate the configuration file with omero db script to change the root password. This generates the postgresql file OMERO4.4__0.sql.

Then I deleted the whole database, and re-made it:

dropdb -h localhost -U pcarlton omero_database
sudo -u postgres createdb -O pcarlton omero_database
sudo -U postgres createlang plpgsql omero_database   #(this wasn't necessary)
psql -h localhost -U pcarlton omero_database < ./OMERO4.4__0.sql

I am now re-populating the data, with a script that first does chromatic correction on each multiwavelength image we have, then moves it to a spare disk, then puts it in the database. In a week or so a few terabytes will have been crunched and our OMERO installation will be ready for general use.


My 3D-SIM microscope takes a lot of maintenance to keep in good working order. The number of things that can go wrong is mind-numbingly huge and often one does not realize until it’s too late that some part of the system (the light, the mechanics, the sample preparation, mounting medium…) was less than perfect, resulting in a crappy image.

One of the many routine tests is reconstructing an image of 200nm beads. It’s the easiest sample to make and reconstruct, so if this goes wrong then something is horribly wrong. Tests like this let you know that at least everything is more or less working and then you can begin to troubleshoot the finer parts.

The left shows the final 3D-SIM reconstructed image, and the right shows the same field taken with conventional imaging and deconvolution. As I said, this type of reconstruction is the easiest to do, so it shouldn’t be used as evidence that your system has zero problems. But the results are still pretty striking!

Raytracing, movie test

This is a movie of a chromosome pairing simulation that I made, raytraced with the excellent, free POV-ray software.

As you can see, the pairing doesn’t go to 100% completion (the green chromosome still has to synapse one end).


For organisms with small numbers of chromosomes, you might think that pairing is not such a problem: for instance, if a chromosome has only two pairs of chromosomes, then there are only 3 ways to pair them together, and one is the correct way — so you’ve got a 1 in 3 chance.