Discussion:
[libav-devel] [PATCH 2/3] configure: speed up check_deps()
James Almer
2018-09-17 02:53:00 UTC
Permalink
From: Avi Halachm <***@yahoo.com>

x4 - x25 faster.

check_deps() recursively enables/disables components, and its loop is
iterated nearly 6000 times. It's particularly slow in bash - currently
consuming more than 50% of configure runtime, and about 20% with other
shells.

This commit applies few local optimizations, most effective first:
- Use $1 $2 ... instead of pushvar/popvar, and same at enable_deep*
- Abort early in one notable case - empty deps, to avoid costly no-op.
- Smaller changes which do add up:
- Handle ${cfg}_checking locally instead of via enable[d]/disable
- ${cfg}_checking: test done before inprogress - x2 faster in 50%+
- one eval instead of several at the empty-deps early abort path.

- The "actual work" part is unmodified - just its surroundings.

Biggest speedups (relative and absolute) are observed with bash.

Signed-off-by: James Almer <***@gmail.com>
---
configure | 83 ++++++++++++++++++++++++++++---------------------------
1 file changed, 43 insertions(+), 40 deletions(-)

diff --git a/configure b/configure
index be348e881..a70136dc1 100755
--- a/configure
+++ b/configure
@@ -532,13 +532,10 @@ disable_sanitized(){
do_enable_deep(){
for var; do
enabled $var && continue
- eval sel="\$${var}_select"
- eval sgs="\$${var}_suggest"
- pushvar var sgs
- enable_deep $sel
- popvar sgs
- enable_deep_weak $sgs
- popvar var
+ set -- $var
+ eval enable_deep \$${var}_select
+ var=$1
+ eval enable_deep_weak \$${var}_suggest
done
}

@@ -550,9 +547,9 @@ enable_deep(){
enable_deep_weak(){
for var; do
disabled $var && continue
- pushvar var
+ set -- $var
do_enable_deep $var
- popvar var
+ var=$1
enable_weak $var
done
}
@@ -609,38 +606,44 @@ is_in(){

check_deps(){
for cfg; do
- enabled ${cfg}_checking && die "Circular dependency for $cfg."
- disabled ${cfg}_checking && continue
- enable ${cfg}_checking
-
- eval dep_all="\$${cfg}_deps"
- eval dep_any="\$${cfg}_deps_any"
- eval dep_con="\$${cfg}_conflict"
- eval dep_sel="\$${cfg}_select"
- eval dep_sgs="\$${cfg}_suggest"
- eval dep_ifa="\$${cfg}_if"
- eval dep_ifn="\$${cfg}_if_any"
-
- pushvar cfg dep_all dep_any dep_con dep_sel dep_sgs dep_ifa dep_ifn
- check_deps $dep_all $dep_any $dep_con $dep_sel $dep_sgs $dep_ifa $dep_ifn
- popvar cfg dep_all dep_any dep_con dep_sel dep_sgs dep_ifa dep_ifn
-
- [ -n "$dep_ifa" ] && { enabled_all $dep_ifa && enable_weak $cfg; }
- [ -n "$dep_ifn" ] && { enabled_any $dep_ifn && enable_weak $cfg; }
- enabled_all $dep_all || disable $cfg
- enabled_any $dep_any || disable $cfg
- disabled_all $dep_con || disable $cfg
- disabled_any $dep_sel && disable $cfg
-
- enabled $cfg && enable_deep_weak $dep_sel $dep_sgs
-
- for dep in $dep_all $dep_any $dep_sel $dep_sgs; do
- # filter out library deps, these do not belong in extralibs
- is_in $dep $LIBRARY_LIST && continue
- enabled $dep && eval append ${cfg}_extralibs ${dep}_extralibs
- done
+ eval [ x\$${cfg}_checking = xdone ] && continue
+ eval [ x\$${cfg}_checking = xinprogress ] && die "Circular dependency for $cfg."
+
+ eval "
+ dep_all=\$${cfg}_deps
+ dep_any=\$${cfg}_deps_any
+ dep_con=\$${cfg}_conflict
+ dep_sel=\$${cfg}_select
+ dep_sgs=\$${cfg}_suggest
+ dep_ifa=\$${cfg}_if
+ dep_ifn=\$${cfg}_if_any
+ "
+
+ # most of the time here $cfg has no deps - avoid costly no-op work
+ if [ "$dep_all$dep_any$dep_con$dep_sel$dep_sgs$dep_ifa$dep_ifn" ]; then
+ eval ${cfg}_checking=inprogress
+
+ set -- $cfg "$dep_all" "$dep_any" "$dep_con" "$dep_sel" "$dep_sgs" "$dep_ifa" "$dep_ifn"
+ check_deps $dep_all $dep_any $dep_con $dep_sel $dep_sgs $dep_ifa $dep_ifn
+ cfg=$1; dep_all=$2; dep_any=$3; dep_con=$4; dep_sel=$5 dep_sgs=$6; dep_ifa=$7; dep_ifn=$8
+
+ [ -n "$dep_ifa" ] && { enabled_all $dep_ifa && enable_weak $cfg; }
+ [ -n "$dep_ifn" ] && { enabled_any $dep_ifn && enable_weak $cfg; }
+ enabled_all $dep_all || disable $cfg
+ enabled_any $dep_any || disable $cfg
+ disabled_all $dep_con || disable $cfg
+ disabled_any $dep_sel && disable $cfg
+
+ enabled $cfg && enable_deep_weak $dep_sel $dep_sgs
+
+ for dep in $dep_all $dep_any $dep_sel $dep_sgs; do
+ # filter out library deps, these do not belong in extralibs
+ is_in $dep $LIBRARY_LIST && continue
+ enabled $dep && eval append ${cfg}_extralibs ${dep}_extralibs
+ done
+ fi

- disable ${cfg}_checking
+ eval ${cfg}_checking=done
done
}
--
2.19.0
James Almer
2018-09-17 02:53:01 UTC
Permalink
From: Avi Halachm <***@yahoo.com>

- Allow to add deps in any order rather than "in linking order".
- Expand deps chains as required rather than just once.
- Validate that there are no cycles.
- Validate that [after expansion] deps are limited to other fflibs.
- Remove expectation for a specific output order of unique().

Previously when adding items to <fflib>_deps, developers were
required to add them in linking order. This can be awkward and
bug-prone, especially when a list is not empty, e.g. when adding
conditional deps.

It also implicitly expected unique() to keep the last instance of
recurring items such that these lists maintain their linking order
after removing duplicate items.

This patch mainly allows to add deps in any order by keeping just
one master list in linking order, and then reordering all the
<fflib>_deps lists to align with the master list order.

This master list is LIBRARY_LIST itself, where otherwise its order
doesn't matter.

The patch also removes a limit where these deps lists were expanded
only once. This could have resulted in incomplete expanded lists,
or forcing devs to add already-deducable deps to avoid this issue.

Note: it is possible to deduce the master list order automatically
from the deps lists, but in this case it's probably not worth the
added complexity, even if minor. Maintaining one list should be OK.

Signed-off-by: James Almer <***@gmail.com>
---
configure | 41 +++++++++++++++++++++++++++++++++--------
1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index a70136dc1..2a5d411aa 100755
--- a/configure
+++ b/configure
@@ -1425,13 +1425,13 @@ FEATURE_LIST="
"

LIBRARY_LIST="
- avcodec
avdevice
avfilter
+ swscale
avformat
+ avcodec
avresample
avutil
- swscale
"

LICENSE_LIST="
@@ -2616,7 +2616,7 @@ transcode_aac_example_deps="avcodec avformat avresample"
cpu_init_extralibs="pthreads_extralibs"
cws2fws_extralibs="zlib_extralibs"

-# libraries, in linking order
+# libraries, in any order
avcodec_deps="avutil"
avcodec_select="null_bsf"
avdevice_deps="avformat avcodec avutil"
@@ -5184,7 +5184,7 @@ fi

enabled zlib && add_cppflags -DZLIB_CONST

-# conditional library dependencies, in linking order
+# conditional library dependencies, in any order
enabled movie_filter && prepend avfilter_deps "avformat avcodec"
enabled_any asyncts_filter resample_filter &&
prepend avfilter_deps "avresample"
@@ -5192,11 +5192,36 @@ enabled scale_filter && prepend avfilter_deps "swscale"

enabled opus_decoder && prepend avcodec_deps "avresample"

+# reorder the items at var $1 to align with the items order at var $2 .
+# die if an item at $1 is not at $2 .
+reorder_by(){
+ eval rb_in=\$$1
+ eval rb_ordered=\$$2
+
+ for rb in $rb_in; do
+ is_in $rb $rb_ordered || die "$rb at \$$1 is not at \$$2"
+ done
+
+ rb_out=
+ for rb in $rb_ordered; do
+ is_in $rb $rb_in && rb_out="$rb_out$rb "
+ done
+ eval $1=\$rb_out
+}
+
+# deps-expand fflib $1: N x {append all expanded deps; unique}
+# within a set of N items, N expansions are enough to expose a cycle.
expand_deps(){
- lib_deps=${1}_deps
- eval "deps=\$$lib_deps"
- append $lib_deps $(map 'eval echo \$${v}_deps' $deps)
- unique $lib_deps
+ unique ${1}_deps # required for the early break test.
+ for dummy in $LIBRARY_LIST; do # N iteratios
+ eval deps=\$${1}_deps
+ append ${1}_deps $(map 'eval echo \$${v}_deps' $deps)
+ unique ${1}_deps
+ eval '[ ${#deps} = ${#'${1}_deps'} ]' && break # doesn't expand anymore
+ done
+
+ eval is_in $1 \$${1}_deps && die "Dependency cycle at ${1}_deps"
+ reorder_by ${1}_deps LIBRARY_LIST # linking order is expected later
}

map 'expand_deps $v' $LIBRARY_LIST
--
2.19.0
Diego Biurrun
2018-09-17 18:49:19 UTC
Permalink
x50 - x200 faster.
The set looks very interesting. I've had some ideas on how to speed up
this part of configure already, so I'm doubly happy to see some work
done in that area.

At a first glance these patches seem to apply several optimizations at
once. I'll have to study them in detail.

There is no flatten_extralibs_wrapper() though...

Diego
James Almer
2018-09-17 19:22:11 UTC
Permalink
Post by Diego Biurrun
x50 - x200 faster.
The set looks very interesting. I've had some ideas on how to speed up
this part of configure already, so I'm doubly happy to see some work
done in that area.
At a first glance these patches seem to apply several optimizations at
once. I'll have to study them in detail.
There is no flatten_extralibs_wrapper() though...
Forgot to rename that while porting the patch from ffmpeg, sorry.

On windows this set made a massive difference, going from several
minutes down to one.
Post by Diego Biurrun
Diego
_______________________________________________
libav-devel mailing list
https://lists.libav.org/mailman/listinfo/libav-devel
avih
2018-09-17 19:42:56 UTC
Permalink
( I'm not registered to the list)
The changes to unique/resolve/flatten_extralibs are all local to each function. That is, each modified function is a stand-alone drop-in replacement for the original function.
flatten_extralibs_wrapper itself was not modified at the original set, but its callees were.
While you don't have flatten_extralibs_wrapper, it seems that your file still has its functionality ( https://git.libav.org/?p=libav.git;a=blob;f=configure;h=48e8536b07cd5569565e5f9082adc9bb3ad1933c;hb=HEAD#l5129 ), and therefore will also gain from the change to unique (unique is not used by flatten_extralibs but is used - and was very slow, from the ffmpeg's wrapper or your equivalent code).
Bottom line, as far as I can tell it should have similar benefits as it did with ffmpeg's configure despite not having the explicit function flatten_extralibs_wrapper .
Avi
x50 - x200 faster.
The set looks very interesting. I've had some ideas on how to speed up
this part of configure already, so I'm doubly happy to see some work
done in that area.

At a first glance these patches seem to apply several optimizations at
once. I'll have to study them in detail.

There is no flatten_extralibs_wrapper() though...

Diego

Loading...