[Macchiato] EDKII grub boot fails with PCIe init
Frederik Lotter
frederik.lotter at netronome.com
Wed Mar 14 07:03:08 GMT 2018
On Tue, Mar 13, 2018 at 5:39 PM, Ard Biesheuvel <ard.biesheuvel at linaro.org>
wrote:
> On 13 March 2018 at 15:31, Frederik Lotter
> <frederik.lotter at netronome.com> wrote:
> > On Tue, Mar 13, 2018 at 3:45 PM, Ard Biesheuvel <
> ard.biesheuvel at linaro.org>
> > wrote:
> >>
> >> On 13 March 2018 at 13:44, Frederik Lotter
> >> <frederik.lotter at netronome.com> wrote:
> >> > On Tue, Mar 13, 2018 at 2:55 PM, Ard Biesheuvel
> >> > <ard.biesheuvel at linaro.org>
> >> > wrote:
> >> >>
> >> >> On 13 March 2018 at 12:48, Frederik Lotter
> >> >> <frederik.lotter at netronome.com> wrote:
> >> >> > On Tue, Mar 13, 2018 at 2:36 PM, Ard Biesheuvel
> >> >> > <ard.biesheuvel at linaro.org>
> >> >> > wrote:
> >> >> >>
> >> >> >> On 13 March 2018 at 12:26, Frederik Lotter
> >> >> >> <frederik.lotter at netronome.com> wrote:
> >> >> >> > Hi Ard,
> >> >> >> >
> >> >> >> > On Mon, Mar 12, 2018 at 6:22 PM, Ard Biesheuvel
> >> >> >> > <ard.biesheuvel at linaro.org>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> On 12 March 2018 at 16:15, Frederik Lotter
> >> >> >> >> <frederik.lotter at netronome.com> wrote:
> >> >> >> >> > Hi,
> >> >> >> >> > I am getting CPU stall warnings when booting up using the EFI
> >> >> >> >> > route.
> >> >> >> >> > I
> >> >> >> >> > suspect the PCIe interface, as the stall warning sometimes
> >> >> >> >> > contain
> >> >> >> >> > the
> >> >> >> >> > probe
> >> >> >> >> > function. Other times is seems to get further than PCIe init,
> >> >> >> >> > but
> >> >> >> >> > still
> >> >> >> >> > stall interrupt handling.
> >> >> >> >> > Here are some facts around my observation:
> >> >> >> >> >
> >> >> >> >> > I have two sdcards for my Machiattobin board. They have
> >> >> >> >> > identical
> >> >> >> >> > kernels
> >> >> >> >> > (4.16 rc5) with Ubuntu 16.04 rootfs. The one sdcard uses a
> >> >> >> >> > uboot,
> >> >> >> >> > DT
> >> >> >> >> > and
> >> >> >> >> > kernel boot. The second sdcard has EDKII, grub kernel boot.
> The
> >> >> >> >> > EDKII
> >> >> >> >> > build
> >> >> >> >> > includes the device tree DTB (and DTS which I believe is
> >> >> >> >> > unused)
> >> >> >> >> > from
> >> >> >> >> > the
> >> >> >> >> > one used on the uboot sdcard.
> >> >> >> >> >
> >> >> >> >> > EFI stub: Booting Linux Kernel...
> >> >> >> >> > EFI stub: Using DTB from configuration table
> >> >> >> >> > EFI stub: Exiting boot services and installing virtual
> address
> >> >> >> >> > map...
> >> >> >> >> > [ 0.000000] Booting Linux on physical CPU 0x0000000000
> >> >> >> >> > [0x410fd081]
> >> >> >> >> > [ 0.000000] Linux version 4.16.0-rc5-mbcin-netronome-2-
> dirty
> >> >> >> >> > (root at mcb1-cpt) (gcc version 5.4.0 20160609 (Ubuntu/Linaro
> >> >> >> >> > 5.4.0-6ubuntu1~16.04.9)) #2 SMP PREEMPT Mon Mar 12 14:40:25
> UTC
> >> >> >> >> > 2018
> >> >> >> >> > [ 0.000000] Machine model: Marvell 8040 MACHIATOBin
> >> >> >> >> > [ 0.000000] efi: Getting EFI parameters from FDT:
> >> >> >> >> > [ 0.000000] efi: EFI v2.70 by EDK II
> >> >> >> >> > [ 0.000000] efi: SMBIOS 3.0=0xbfd00000 ACPI
> 2.0=0xb6760000
> >> >> >> >> > MEMATTR=0xb8973418 RNG=0xbffdbf98
> >> >> >> >> > [ 0.000000] random: fast init done
> >> >> >> >> > [ 0.000000] efi: seeding entropy pool
> >> >> >> >> > :
> >> >> >> >> >
> >> >> >> >> > (I am using the latest EDKII master, the Marvell
> >> >> >> >> > edk2-open-platform
> >> >> >> >> > 17.10
> >> >> >> >> > banch, with all the latest mv-ddr/ atf /etc....).
> >> >> >> >> >
> >> >> >> >> > The DT data appear there in die EFI boot, but the PCIe
> >> >> >> >> > interface
> >> >> >> >> > fails,
> >> >> >> >> > and
> >> >> >> >> > results (I believe) in the CPU stall warnings:
> >> >> >> >> >
> >> >> >> >> > [ 717.453025] INFO: rcu_preempt self-detected stall on CPU
> >> >> >> >> > :
> >> >> >> >> > :
> >> >> >> >> > [ 717.589783] armada8k_pcie_probe+0x140/0x240
> >> >> >> >> > :
> >> >> >> >> >
> >> >> >> >> > Other times, the pcie gets further:
> >> >> >> >> >
> >> >> >> >> > [ 3.312127] PCI: OF: host bridge /cp0/pcie at f2600000
> ranges:
> >> >> >> >> > [ 3.317740] PCI: OF: IO 0xf9000000..0xf900ffff ->
> >> >> >> >> > 0xf9000000
> >> >> >> >> > [ 3.323692] PCI: OF: MEM 0xc0000000..0xdfffffff ->
> >> >> >> >> > 0xc0000000
> >> >> >> >> > [ 3.328915] random: crng init done
> >> >> >> >> > [ 4.326158] armada8k-pcie f2600000.pcie: phy link never
> came
> >> >> >> >> > up
> >> >> >> >> > [ 4.332109] armada8k-pcie f2600000.pcie: Link not up after
> >> >> >> >> > reconfiguration
> >> >> >> >> > [ 4.339056] armada8k-pcie f2600000.pcie: PCI host bridge
> to
> >> >> >> >> > bus
> >> >> >> >> > 0000:00
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> To be brutally honest, the armada8k-pcie driver is a piece of
> >> >> >> >> junk,
> >> >> >> >> and you're much better off using the generic ECAM driver, which
> >> >> >> >> now
> >> >> >> >> includes special handling for the missing root port on Synopsys
> >> >> >> >> IP.
> >> >> >> >>
> >> >> >> >> It also allows you to have both MMIO32 and MMIO64 regions,
> which
> >> >> >> >> can
> >> >> >> >> be useful with some PCIe cards with large BARs
> >> >> >> >>
> >> >> >> >> Could you try
> >> >> >> >>
> >> >> >> >> compatible = "marvell,armada8k-pcie-ecam";
> >> >> >> >>
> >> >> >> >> in the DT node, please?
> >> >> >> >>
> >> >> >> >> (Before you do that, please check whether UEFI recognizes your
> >> >> >> >> PCI
> >> >> >> >> hardware using the 'pci' command in the shell)
> >> >> >> >
> >> >> >> >
> >> >> >> > This exercise help a lot. Thank you for the proposal.
> >> >> >> >
> >> >> >> > So now I can consistently boot using uboot and efi.
> >> >> >> >
> >> >> >> > However, the pcie driver init fails. I have provided boot logs
> and
> >> >> >> > also
> >> >> >> > my
> >> >> >> > DT entry - we need custom BAR ranges, and I am not sure if this
> >> >> >> > driver
> >> >> >> > understand everything.
> >> >> >> >
> >> >> >> > cp0_pcie0: pcie at f2600000 {
> >> >> >> > compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
> >> >> >> > reg = <0 0xf2600000 0 0x10000>,
> >> >> >> > <0 ((0xf6000000 + (0 * 0x1000000)) + 0xf00000) 0
> 0x80000>;
> >> >> >> > reg-names = "ctrl", "config";
> >> >> >> > #address-cells = <3>;
> >> >> >> > #size-cells = <2>;
> >> >> >> > #interrupt-cells = <1>;
> >> >> >> > device_type = "pci";
> >> >> >> > dma-coherent;
> >> >> >> > msi-parent = <&gic_v2m0>;
> >> >> >> >
> >> >> >> > bus-range = <0 0xff>;
> >> >> >> > ranges =
> >> >> >> >
> >> >> >> > <0x81000000 0 (0xf9000000 + (0 * 0x10000)) 0 (0xf9000000 + (0
> *
> >> >> >> > 0x10000))
> >> >> >> > 0 0x10000
> >> >> >> >
> >> >> >> > 0x82000000 0 (0xf6000000 + (0 * 0x1000000)) 0 (0xf6000000 +
> (0 *
> >> >> >> > 0x1000000)) 0 0xf00000>;
> >> >> >> > interrupt-map-mask = <0 0 0 0>;
> >> >> >> > interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
> >> >> >> > interrupts = <0x0 22 4>;
> >> >> >> > num-lanes = <1>;
> >> >> >> > clocks = <&cp0_clk 1 13>;
> >> >> >> > status = "disabled";
> >> >> >> > };
> >> >> >> >
> >> >> >> >
> >> >> >> > Error:
> >> >> >> >
> >> >> >> > [ 1.396968] PCI: OF: host bridge /cp0/pcie at f2600000 ranges:
> >> >> >> > [ 1.396979] PCI: OF: IO 0xf9000000..0xf900ffff ->
> 0xf9000000
> >> >> >> > [ 1.396984] PCI: OF: MEM 0xc0000000..0xdfffffff ->
> 0xc0000000
> >> >> >> > [ 1.396998] pci-host-generic f2600000.pcie: ECAM area [mem
> >> >> >> > 0xf2600000-0xf260ffff] can only accommodate [bus
> >> >> >> > 00-ffffffffffffffff]
> >> >> >> > (reduced from [bus 00-ff] desired)
> >> >> >> > [ 1.397002] pci-host-generic f2600000.pcie: ECAM ioremap
> failed
> >> >> >> > [ 1.397011] pci-host-generic: probe of f2600000.pcie failed
> >> >> >> > with
> >> >> >> > error
> >> >> >> > -12
> >> >> >> >
> >> >> >> >
> >> >> >> > Thanks for the support.
> >> >> >> >
> >> >> >>
> >> >> >> Please try the following config
> >> >> >>
> >> >> >> cp0_pcie0: pcie at e0000000 {
> >> >> >> compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
> >> >> >> reg = <0 0xe0000000 0 0xff00000>;
> >> >> >> #address-cells = <3>;
> >> >> >> #size-cells = <2>;
> >> >> >> #interrupt-cells = <1>;
> >> >> >> device_type = "pci";
> >> >> >> dma-coherent;
> >> >> >> msi-parent = <&gic_v2m0>;
> >> >> >>
> >> >> >> bus-range = <0 0xfe>;
> >> >> >> ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0
> >> >> >> 0x00010000>,
> >> >> >> <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0
> >> >> >> 0x20000000>,
> >> >> >> <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1
> >> >> >> 0x00000000>;
> >> >> >>
> >> >> >> interrupt-map-mask = <0 0 0 0>;
> >> >> >> interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
> >> >> >> };
> >> >> >
> >> >> >
> >> >> > I am trying it now.
> >> >> >
> >> >> > Could you just give me some insight on how the peripheral base
> >> >> > address
> >> >> > can
> >> >> > be just modified like that ?
> >> >> >
> >> >> > Is there a mapping change somewhere?
> >> >> >
> >> >>
> >> >> All those addresses are configurable, and the default armada8k-pcie
> >> >> driver sets up all the translation windows from scratch (in a rather
> >> >> limited way, mind you)
> >> >>
> >> >> The armada8k-pcie-ecam driver just reuses the configuration set by
> the
> >> >> firmware, allowing for a larger bus range and an additional 4 GB
> >> >> window for 64-bit MMIO
> >> >
> >> >
> >> > The new DTS extract:
> >> >
> >> > cp0_pcie0: pcie at 0xe0000000 {
> >> > compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";
> >> > reg = <0 0xe0000000 0 0x10000>;
> >> >
> >> > #address-cells = <3>;
> >> > #size-cells = <2>;
> >> > #interrupt-cells = <1>;
> >> > device_type = "pci";
> >> > dma-coherent;
> >> > msi-parent = <&gic_v2m0>;
> >> >
> >> > bus-range = <0 0xfe>;
> >> > ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0 0x00010000>,
> >> > <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0 0x20000000>,
> >> > <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1 0x00000000>;
> >> >
> >> > interrupt-map-mask = <0 0 0 0>;
> >> > interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;
> >> > };
> >> >
> >> > The result:
> >> >
> >> > [ 1.463594] PCI: OF: host bridge /cp0/pcie at 0xe0000000 ranges:
> >> > [ 1.463608] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000
> >> > [ 1.463616] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000
> >> > [ 1.463622] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000
> >> > [ 1.463638] pci-host-generic e0000000.pcie: ECAM area [mem
> >> > 0xe0000000-0xe000ffff] can only accommodate [bus 00-ffffffffffffffff]
> >> > (reduced from [bus 00-fe] desired)
> >> > [ 1.463646] pci-host-generic e0000000.pcie: ECAM ioremap failed
> >> > [ 1.463657] pci-host-generic: probe of e0000000.pcie failed with
> >> > error
> >> > -12
> >> >
> >> >
> >>
> >> Please use the size I suggested for the 'reg' property
> >
> >
> > Sorry I missed that:
> >
> > [ 1.463413] PCI: OF: host bridge /cp0/pcie at 0xe0000000 ranges:
> > [ 1.463427] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000
> > [ 1.463435] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000
> > [ 1.463442] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000
> > [ 1.463481] pci-host-generic e0000000.pcie: ECAM at [mem
> > 0xe0000000-0xefefffff] for [bus 00-fe]
> > [ 1.463525] pci-host-generic e0000000.pcie: PCI host bridge to bus
> > 0000:00
> > [ 1.463531] pci_bus 0000:00: root bus resource [bus 00-fe]
> > [ 1.463536] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
> > [ 1.463541] pci_bus 0000:00: root bus resource [mem
> > 0xc0000000-0xdfffffff]
> > [ 1.463547] pci_bus 0000:00: root bus resource [mem
> > 0x800000000-0x8ffffffff]
> >
> >
> > So I assume this works, and I am super grateful. I will test it tomorrow
> > with our Smart NIC.
> >
>
> This doesn't tell you all that much, to be honest. But at least the
> numbers look sane now, and appear to match the UEFI configuration.
>
Perhaps I was too optimistic too quickly.
root at localhost:~# lspci
root at localhost:~# lspci -v
root at localhost:~# echo 1 > /sys/bus/pci/
devices/ drivers_probe slots/
drivers/ rescan uevent
drivers_autoprobe resource_alignment
root at localhost:~# echo 1 > /sys/bus/pci/rescan
[ 176.977408] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 176.983100] 1-...0: (1 GPs behind) idle=242/1/4611686018427387904
softirq=3435/3435 fqs=22
[ 176.991572] (detected by 2, t=5375 jiffies, g=1169, c=1168, q=60)
>
> > However, we are building a product that obviously requires long term
> > maintenance, so may I please get your input on a strategy with this?
> >
> > If we decide to stick with this driver, would it be easy for things to
> > become disjointed?
> >
> > The hope with going the EFI route is that we could boot "generic" Ubuntu
> and
> > CentOS installs, so I guess as long as we keep the DT and the EFKII
> snapshot
> > in sync on our side, the risk is low.
> >
>
> I'm afraid you are getting caught in the middle of a philosophical
> debate here: many engineers that are involved with the Marvell support
> in Linux feel that a device tree is not something that should be
> supported long term, and needs to be bundled with the OS. Over the
> last couple of kernel releases, the Marvell 8040 support was changed
> in a non-backward compatible manner numerous times.
>
> This conflicts badly with the idea that the firmware provides the
> hardware description (using DT or ACPI), and that the contract with
> the OS is kept by both sides for longer than a single release.
>
> So I cannot really answer that question, unfortunately. If you don't
> intend to use the onboard network controller, you could go the ACPI
> route, I guess.
>
> Another problem is that none of this UEFI/ACPI support is upstream in
> the Tianocore project, and trying random trees left and right doesn't
> really help when assessing whether a platform is suitable as a long
> term investment.
>
>
>
> > For example, using the same DT with uboot, it fails:
> >
> > [ 0.294942] sysfs: cannot create duplicate filename
> > '/bus/platform/devices/e0000000.pcie'
> > [ 0.294950] CPU: 2 PID: 1 Comm: swapper/0 Not tainted
> > 4.16.0-rc5-mbcin-netronome-2-dirty #2
> > [ 0.294952] Hardware name: Marvell 8040 MACHIATOBin (DT)
> > [ 0.294955] Call trace:
> > [ 0.294967] dump_backtrace+0x0/0x150
> > [ 0.294970] show_stack+0x14/0x20
> > [ 0.294976] dump_stack+0x98/0xbc
> > [ 0.294980] sysfs_warn_dup+0x60/0x78
> > [ 0.294983] sysfs_do_create_link_sd.isra.0+0xd8/0xe0
> > [ 0.294986] sysfs_create_link+0x20/0x40
> > [ 0.294990] bus_add_device+0x88/0x148
> > [ 0.294993] device_add+0x394/0x568
> > [ 0.294997] of_device_add+0x5c/0x70
> > [ 0.295000] of_platform_device_create_pdata+0x80/0xd0
> > [ 0.295003] of_platform_bus_create+0xdc/0x300
> > [ 0.295006] of_platform_bus_create+0x11c/0x300
> > [ 0.295008] of_platform_populate+0x4c/0xb0
> > [ 0.295014] of_platform_default_populate_init+0xa4/0xc0
> > [ 0.295017] do_one_initcall+0x38/0x120
> > [ 0.295020] kernel_init_freeable+0x134/0x1d4
> > [ 0.295025] kernel_init+0x10/0x100
> > [ 0.295028] ret_from_fork+0x10/0x18
> >
> > So I think this confirms that the pcie setup is different between EDKII
> and
> > uboot (unless I am doing something stupid here).
> >
>
> It looks like you have two copies of the pcie node here, no?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.einval.com/pipermail/macchiato/attachments/20180314/94aab327/attachment-0001.html>
More information about the Macchiato
mailing list