rustgc_paper/response.tex at master · softdevteam/rustgc_paper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
\documentclass[12pt,a4paper,preprint]{article}


\usepackage{setspace}
\usepackage{xcolor}
\usepackage{amssymb}
\usepackage{listings}
\usepackage{listings-rust}
\usepackage{softdev}
\usepackage{quoting}
\usepackage{xspace}
\usepackage[many]{tcolorbox}
\newtcolorbox{blockquote}{%
  sharp corners,
  colback=gray!5,
  colframe=gray!60,
  leftrule=2pt,
  enhanced,
  before skip=6pt, after skip=18pt,
  boxrule=0pt,
  left=6pt, right=4pt,
  borderline west={2pt}{0pt}{gray!60},
  fontupper=\scriptsize\sffamily\selectfont
}


\setlength{\parindent}{0pt}
\setlength{\parskip}{2.0ex plus0.5ex minus0.2ex}

%\newcommand\laurie[1]{\mynote{Laurie}{#1}}
%\newcommand\jake[1]{\mynote{Jake}{#1}}

\newcommand\Egcrc{E$_\textrm{gcvs}$\xspace}
\newcommand\Eelision{E$_\textrm{elision}$\xspace}
\newcommand\Epremopt{E$_\textrm{premopt}$\xspace}
\newcommand\boehm{\textsc{BDWGC}\xspace}

\begin{document}

\date{}  % This removes the date
\title{Response to Reviewer Feedback}
\maketitle

We thank the reviewers for their thoughtful reviews: we hope to have
taken onboard your feedback and addressed your questions. In this document
we summarise the major changes we have made: the diff shows changes to the prose
(though it does tend to mangle changes to figures and tables).


\subsection*{Finalization semantics}

\begin{blockquote}
explicitly discuss and present the issues around finalizations semantics, as
  learned from other languages
\end{blockquote}

We have made a number of changes (chiefly to the end of Section 3, and
throughout Section 4), which we hope address this.

As the reviewers note ``an
implementation that does not finalize anything would be compliant when
finalization may be deferred indefinitely'' is true of most GCed
implementations we are aware of, including Java. Indeed, Java once had the
\lstinline{runFinalizersOnExit} flag to force finalizers to be run before the
JVM exists; it was long marked as deprecated as ``inherently unsafe'' and
now appears to have been removed; instead,
there is now the \lstinline{addShutdownHook} mechanism.

While investigating this further, we discovered one surprise (to us, at least),
which muddies the waters with respect to finalizer latency further. In Rust,
when a thread unwinds (i.e.~panics) then, depending on compile-time flags, no
destructors will be run --- but Alloy can run all of the `queued' finalizers!
We have not belaboured this point in the paper, but it shows that destructors
can have guaranteed-infinite latency in some situations where finalizers have
infinite-at-worst latency.


\subsection*{Performance Evaluation}

We have made a large number of changes to the performance evaluation. These are
mostly due to reviewer comments, but we also uncovered a mistake in our
experiment. We previously used \boehm's \lstinline{GC_heapsize} metric:
this does not, as we assumed, represent current heap usage, but total
memory usage since the program's start. We now use the \lstinline{GC_bytes_allocd} metric,
which does measure the correct thing. This changes the results for finalizer
elision in one way: previously this appeared to cause heap usage to grow, but in
fact it shrinks.

The most significant changes are:

\begin{itemize}
  \item We provide a comparison of \lstinline{Rc<T>} with jemalloc and \boehm
    in Table~4 in the Appendix. This suggests that the inherent
    overhead of \boehm ranges from 2--26\%, averaging around 11\%.

  \item We have standardized all visualizations by normalizing against
    appropriate baselines (e.g.~Alloy-without-elision as the baseline for
    the \Eelision experiment; and barrier-free execution as the baseline
    for the \Epremopt experiment). This allows us to show what we hope
    are easily understand performance ratios. We also provide the raw data
    in the appendix.

  \item We provide data for wall-clock and user-time.

  \item We provide data for the number of collections run, and the total GC
  pause time.
\end{itemize}

We also made some minor changes (e.g.~we now always use dynamic
linking of \boehm; we reran benchmarks on a more modern machine), but
these had no obvious impact on our results.

The downside to this is that we now present far too many figures and tables to
fit in the main body of the paper! We have tried to select the subset of
visualizations that will most quickly help the reader understand the overall
picture for the main body of the paper, relegating everything else to the
appendix.

We were asked -- when explaining the overhead of \boehm -- the following:

\begin{blockquote}
Ideally this would include a
discussion of how much of the performance deficit is likely redeemable via
"engineering", and how much is likely to be intrinsic (i.e. due to
unavoidable conservatism), so that we can get a better intuition of the
plausible "best case" for this approach.
\end{blockquote}

We ended up a little unsure which of two questions this might be asking,
so please forgive us if we answer both!

Although we can now quantify the overhead of Boehm vs.~jemalloc in our context,
the data doesn't really give us clues as to why the overhead exists. That makes
us nervous about speculating further. While we are not total allocator dummies,
both of us have talked to allocator experts over the years to know that there is a lot
here that we don't know. At a high level, \boehm is -- and we don't mean this as an
insult or criticism! -- ``old''. Since then many allocators (e.g.~dlmalloc, jemalloc)
have been designed, and subsequently tweaked and improved; each such allocator
is full of many decisions which are likely to affect its performance. Unfortunately,
we don't really know which decisions might be relevant or not.

Another possibility is to use a Bartlett-esque semi-conservative GC. The literature
offers a complex picture: sometimes this is faster, sometimes slower, than
fully conservative GC. We again lack confidence to know how this might translate
into our context.


\subsection*{Conclusion}

\begin{blockquote}
add a discussion thats sums up the overall benefits/drawbacks of this
particular approach, possibly depending on specific scenarios/use cases, and
perhaps outlines directions for future work from a conceptional perspective,
i.e., possible design directions that could be investigated
\end{blockquote}

We have expanded the conclusions, but we hope to have done so in a way
that does not overstate our confidence in our predictions.

\end{document}